TPR3: Cross-Site Scripting: What is it, and how can you protect your site from becoming a victim?

Paul Gilzow, Programmer/Analyst-Expert, University of Missouri


The audio for this podcast can be downloaded at http://highedweb.org/2008/presentations/tpr3.mp3


[Intro Music]

Announcer:  You’re listening to one in a series of presentations from the 2008 HighEdWeb Conference in Springfield, Missouri.

Paul Gilzow:  Good morning, my name is Paul Gilzow with the Department of Web Communications at the University of Missouri.  I’ll let you know this presentation usually takes an hour and 15 minutes, we only got 45 so I might have to talk a little bit faster.  I’m going to have to ask you to hold your questions until the end if you can.  If you’ve got your laptops and you’re Twittering, I’ve created a back channel at hew08xss.  And if you want to you can post your questions there and after the presentation I’ll go back tonight, post those answers up on the Ning site.

So just a real quick poll, how many of you have heard cross-site scripting?  Just about everybody, OK.  How many of you have a pretty good idea of what cross-site scripting is?  All right, a little bit less.  How many of you actually use cross-site scripting to exploit somebody else’s site.  A couple, OK.  The FBI would like to see you after this presentation is over.  [Laughter]  How many of you’ve done HTML development?  Pretty much everybody, everybody is familiar with HTML code.  How many of you’ve done some Java Script development?  All right, good.

Since some of you have heard of it but not quite really sure what it is, I’m going to start at the base and work our way back up.  All right, so all browsers follow what’s called the same origin policy.  And what that means is the content of the page has to come from the same host, the same protocol and the same port number for content to interact with each other.  And this is a good thing because if you’ve got Facebook up and you’ve got your bank up, you don’t want Facebook interacting with your bank’s website, right?  OK. 

But you’ve probably all put maybe embedded YouTube videos on your pages.  How many of out have done that?  OK, well the YouTube video doesn’t live on your server does it?  No, it’s out on YouTube’s server.  So you’ve actually got cross-domain content on your website but the browser allows that because as far as it’s concerned, all of that content has come from one domain.  In other words, your server gave out that page with all that content.  All those links, everything.  So the browser says, “OK, that’s fine.”

Well cross-site scripting attempts to exploit or bypass that same origin policy by injecting additional commands into any place you accept inputs.  So cross-site scripting is an injection attack.  It usually takes the form of either HTML code, like an image source or a frame or something like that, or actual script tags and then link out to somebody else’s domain, somebody else’s Web server and pushes over those Java Script files.

Now cross-site scripting is not an attack against on your Web server.  It’s not an attack against your website and for the most part does no damage to your website.  It’s an attack against your users.  It attempts to exploit the trust that your users have for your domain.

Everybody here is higher ed, how many of you are pretty sure that if you send a link to one of your faculty members that’s on your own site, how sure are you that they’re going to trust that link?  Pretty sure, right?  Because it’s their school, they trust that their school is secure.  So we’re going to attempt to exploit that trust that the user has for the website to trick them into going there.  Because, as an attacker, if we can get our script to run in the context of that page, we own the page, we can do anything we want.

And cross-site scripting is usually an indication of a much larger problem.  Now it can be used as an attack in and of itself but it’s also usually combined with additional attacks like cross-site request forgeries, click jacking, lots of others.  So we usually use it – I say “we” – attackers usually use it as a basis, a foundation – a little slip there – to launch additional attacks.

So how prevalent is cross-site scripting?  On 2006, out of 12,000 sites scanned, out of all the vulnerabilities from those 12,000 sites, almost 70% were cross-site scripting.  OK, so that’s a big chunk of vulnerabilities out there.  2007, in 33,000 sites, almost 70,000 vulnerabilities and 60% of those were cross-site scripting.  So we’ve come down a little bit through education and presentation such as this but it’s still a huge problem.

Well how prevalent is it in higher education? Web House Security, which is a consulting firm that does security consulting, from the time period of January 1st to August of this year, of all the sites they scanned, of all the educational sites, 88% of the educational sites had either high, critical or vulnerable exploits in their sites.  And we just saw that 66% of those critical are cross-site scripting so that’s still a huge number of higher ed websites are vulnerable to cross-site scripting.

All right, so what are some of the dangers of cross-site scripting?  Again, cross-site scripting is usually just the first step in a much larger attack.  It’s again, trying to trick your users that trust your site to go into that page that’s owned by the attacker and then maybe phishing.  Gathering up some information for identity theft, collecting email addresses for spamming.  And one of the real dangers it is platform independent.  It is completely platform independent.  It doesn’t matter if you’re on Mac, a Linux machine.  It doesn’t matter what platform you’re on as far as an OS, it all runs within the browser.

How many of you have heard about the talks about the browser becoming the next OS?  Well that’s actually kind of dangerous because it’s platform independent.  If everybody’s got a browser and that’s the next OS, if you find an exploit in the browser or on a page, it can spread like crazy.

How many of you remember Sammy that hit MySpace a couple of years ago?  Yeah?  All right, what was it; over a million users were exploited within a couple of hours.  And so it spreads incredible fast and we don’t have malware scanners on our browsers, at least not yet.  That stuff can just spread like wildfire.  So it spreads much, much faster.

We can do JAR exploits which are getting cleaned up.  JAR is a Java file.  There was actually some exploits you could do where you can launch an attack through a website, through a vulnerable cross-site scripting website, into a JAR file and then the JAR file would pass back over to that exploit, back into the system to launch programs within the context of the user that’s logged in.  So it actually launch your calculator as you all through a website, all through cross-site scripting vulnerability.  All right so huge problems.

And the last is my favorite, whatever your devious little mind can come up with.  Whatever you can think of, once your script is on the page you own the page.  You can record every click, you can record every keystroke, you can take them anywhere you want them to go, you can make them believe anything you want them to believe.

Why is it so dangerous to higher education?  Because people trust our sites.  As an example, my wife is a biology and chemistry teacher at a high school and the other night she was grading papers and grumbling about it.  And the subject of Wikipedia came up because every once in a while she’ll have a student who tries to use Wikipedia as a reference.  I said, “How do you handle that?  Do you actually go out to every website they reference and check to make sure it’s legit?”  And she says, “No, no.  Usually what we do is we give them a list, they’re supposed to go there or if it’s an EDU and the content looks okay, yeah, we’ll trust it.”

Right there, people trust EDU websites.  In fact at least until recently, Google gave the EDU domain a higher page ranking because the EDU domain is locked down, not just everybody can get one.  So people trust your sites and that’s why we’re so incredibly important to attackers because if they can find an EDU that has an exploit, they can use that to trick additional users.

Now you might be saying, “OK, but my users, the bulk of my users, we’ve trained them.  They know what to look for.  We trained them as best we can.”  This report just came out two weeks ago.  North Carolina State University did a study and what they did is they sat people down, they had them working on their computer, had a browser up, browsing a little bit, do some work.  And every once in a while they would send in a fake pop up. 

Like everybody clicked on it and they said, “Something’s got to be wrong.  Send them back.”  And they said, “OK.  You can be working, you can have your browser up, you can be moving around and occasionally we’re going to send you some fake pop ups that’s going to have malware so you want to be really careful what you click on.”  And despite the warning, they hit the “OK” button 63% of the time.  Despite being warned that some of these are going to be bad, they did it anyway.  So again, if somebody can get on your site through a vulnerability, people just click on everything, they’re going to be able to get to your users.

So there are three types of cross-site scripting.  First is called non-persistent or reflective.  The most common and the exploit only lives as long as the user is at that URL.  Again, it does nothing to your server.  You won’t even know anything happened unless you’re looking back over the server logs.  So it only lasts as long as they’re there, but it relies completely on social engineering.  In other words, I have got to trick you, I’ve got to trick your users into going to that link so I can launch that exploit.

The second type is persistent or stored.  This exploit actually it’s stored somewhere in the database or some of the data storage and then replayed back to everybody that comes to the webpage.  Much more dangerous and it’s usually common on Web forms and social sites.

The last is called local, that’s that JAR I was talking about.  It’s less likely but still dangerous.  Less likely because you have to know the file paths of the files you’re trying to attack.  You have to really know the systems those people are on.

So I’m going to give you some examples and then we’re going to do live demos.  Now I drank all of my coffee that I normally drink in four hours in about 30 minutes so I could talk faster.  But if I am talking to you fast or if I skipped over and something is not clear, just let me know.  I’m just going to give you some verbal examples and we’re going to go back and do some live examples.

So OPA, OPA is the codename for the University of Missouri’s Online Job Application system.  It’s just easier to say OPA.  And if I say “paw print” that’s what I call our single sign on ID’s.  As I go through these examples I’ll be saying those.

So at our online job application system, applicants have to register and they store sensitive data.  They store, addresses, previous work history, social security numbers, phone numbers, etc.  All the information that anybody would ever need to be able to do some identity theft.  Shaun happens to find a reflective cross-site scripting vulnerability.  And now what’s Shaun going to do?  He’s going to attempt to trick Jane into going to that URL.  So maybe he says, “Hey!  You know I found this job, it sounds great for you.  I know you’re wanting to move, change jobs, it’s perfect!”

So he sends it out to her in an email or maybe IM or Twitter or something and she heads over.  Now it’s reflective so it means the payload, the actually exploit is in the URL and it comes up as soon as she’s there.  And if she’s already logged in or she attempts to log in, because we control that page, we can gather up all that information and send it back offsite.  If she’s already logged in, I can grab her cookie ID.  And if the site is vulnerable to cross-site request forgeries, I can take over her session and get to all her data.

You’re looking at me with blank stares.  Is that making sense or are just really scared at this point? 

[Laughter]

Audience 1:  You’re talking really fast.

Paul Gilzow:  I am talking really fast.  You can tell me to slow down because I’ve got the coffee and the nerves.  Again, we’re going to do live demos too, so you’ll be able to see it more in depth.  Everybody good on reflective so far, have a pretty good idea what it is?

All right, persistent.  Let’s say that OPA also has a Web form that’s susceptible to cross-site injection.  Shaun finds out about this, he creates a thread, embeds that exploit in there.  And typically, they’re usually inflammatory statements because what you’re trying to do is get as many people to come here.  So he’s going to say something just really outrageous to try to get people to come.

And so Jane comes over and she views the thread and because that injection is coming from the Web server, as far as the browser is concerned it’s OK.  It bypasses that same origin policy and we can now send again all that data back to the attacker.  But it’s not just Jane, it’s everybody.  Anybody that goes to that webpage is going to have the same thing happen to them over and over and over again until somebody finds out and removes that exploit.  OK, good?

OK, local.  Like I said, local is a little bit harder.  Let’s say Jane visits a compromised site that has a local based cross-site scripting attack.  The malicious Java Script on the page is going to launch an attack against an HTML page on Jane’s machine.  So how many you have noticed that a lot of programs nowadays their help files are actually HTML pages?  Yeah, all of Windows Help, all of those.  Yup, everything.  A lot of pages now are HTML pages – help files.

So all you have to do is find one of those that are vulnerable.  And if you know the path you can actually have the browser launch those pages, inject into that page.  And the dangerous part about this is that page is on the local machine and runs within the context of the user logged in.  So now, the script is running as you and could do anything else in your computer that you can do.  Again, very dangerous but less likely because you have to know what platform they’re on, you have to know the file path, etc.  Real dangerous just less likely.

All right, any questions before we actually do the live demos because I know that’s what everybody is here for.  Any questions at all?  No? 

OK, I will warn you that penetration testing on somebody else’s website is a legal gray area.  It is currently illegal in Germany, in the US, it has not been tested so I strongly recommend that you do not go out and start trying to do this on other people’s websites.  The websites I’m going to show you have either had the exploit published and the vendor notified for well over a year or I have talked to the owner and gotten an OK.  I don’t see Jason in here.  Jason Woodward from Cornell?  He’s not in here, OK.  He said I could show this.  So just to let you know.

So I’m going to go over to my windows.  Oh, let’s hope this is working now, if not I guess I’ll just flip back.  OK.  How many of you would probably trust a link from NBC.com?  NBC.com is a pretty well known website, right?  So let’s head over to NBC.com then.  So let me pull up NBC search.  There we go.  Nothing.  I don’t know why that menu is not going away.  There we go.

All right.  So this is what happens.  An attacker will go in and do some reconnaissance.  A reconnaissance is going do or she’s going to is going to try to find out how the Web application handles information.  They usually use some string with different characters and some type of keyword to help them find that information when the page comes back.  If you look in your server logs and you see ABX, I visited.  And I did do a quick little survey before I came here to the conference and six out of 10 of you have cross-site scripting vulnerabilities.  Now if I find your website and it had a cross-site scripting, I’ll be contacting you before the conference is over to share that information with you.  But I just wanted to get a feel, is it really as bad as all the stats look?  Yeah, it kind of is.

So let’s go ahead and hit “search” and see what happens.  OK, it comes back and this is pretty typical.  It says search results four and then the string output.  You see that all the time.  You search at Google, it comes back, it says here’s what you search for. 

All right, what we want to do is we want to see what actually came back in the source.  Now can everybody see that or do I need to make it a little bigger?  Is that good in the back?  OK.  So I’m going to say find that ABX.  So it looks like it’s putting it in Java Script but it’s escaped the quote so maybe I can try to play and escape the escape and try to get through there but I don’t know.  That didn’t look too good so let’s see what else is in here.

There’s that one – URLencode and everything.  So I probably can’t do anything there but everybody see that?  What did they do?  What’s that?

Audience 2:  They did a comparison.

Paul Gilzow:  They did a comparison but what did they leave?  When they put the data back into the page, the information?  They didn’t do anything, did they?  They just put it back in verbatim.  That’s the first sign.  I see that and I’m like, “Oh, wait a second.  Let’s try something.”

All right, so I’m going to go back here and notice the URL still.  It just says “app search”.  So what method for this form is it using?  What’s that? You host right because there’s nothing in the URL.  I did a search, there’s nothing in the URL.

So let’s try this.  Let’s say I write script, alert one.  Be real basic and let’s see what happens.

[Laughter]

All right, so let’s look in the page source and let’s see if we can find that.  This might take a while.  There’s probably lots of ones.  Use something a little bit better.  Maybe I should search for script.  Let’s try script.  Should have used a better keyword in there.  Oh!  There it is!  Again, they put in verbatim and so I was able to inject that code.

As an attacker though I still have a little problem because can I send them a URL to this page and actually run my attack?  I mean there’s some Java Script you could do if I can find another site that’s exploited.  I can have it post the date over here but I can’t just send them to NBC.com at this point and have the attack run because there’s nothing in the URL.  I haven’t been able to put it in.

How many of you use Firebug and Web Developer in Firefox?  Have you ever messed around with the forms piece?  What’s one of the things you can do with the forms piece?  Yup!  You can change the post to get.  So that’s the next thing you do, you go, “All right.  I’m going to change the post to get” and I’m going to try that same attack.  And if I’m lucky I’ll get it in the URL.  Attack runs again and what’s in the URL?  Oops!  Scoot over here so you can see it.  There it is.

Now I can post that to a user and get them to come here.  I know some of you might be thinking, “Ooh, a pop up!  Ooh, scary!”  [Laughter]  There’s more.  This is all you need.  Once you know you can do this you know you can do anything and I’ll show you in a second.  But you can see how dangerous this is.

Now post isn’t the answer either.  You can’t just say, “I’m going to change everything always to post” because all I have to do is find another site that’s vulnerable, create a link to your website and behind that link put Java Script that will post the data first.  Again, just because you change everything to post it doesn’t mean you’re safe.

I’m going to show you another one and this is Cornell.  Everybody familiar with Cornell?  You know a higher education institution, very well known, high level of trust among users.  All right, we’re going to do the same type of attack.  I’m going to go up here to their search bar and I will tell you that pages is not vulnerable, people is.  We’re going to use the same little string, and go to search.  It’s kind of hard to see.  You might not be able to see it but look up at the search box and can you see what happened?  Everybody see it?  My string is kind of hanging out there.

Let’s find it and you did see what happened?  Let me pull it up here a little bit higher.  Remember buffer overflows from a couple of years ago?  Yeah?  OK, this is real simple.  Very reduced explanation but essentially in a buffer overflow you’re trying to overflow the little area that’s contained for your code.  And try to spill out into another area where your code can execute. 

And that’s kind of what we’ve done here.  We’ve said he or she has just put in verbatim what we put in, in the search feature.  And so what I did is I got a quote there and that has essentially ended what should be in-value.  And now all my stuff is spilling out or overflowing into the rest of the page.

So I’m going to show you another one, this one is a little worse.  I’ve got this one saved so I didn’t have to type it all out.  But you can do this kind of stuff as it loads.  Same exact attack.  [Laughter]  Page defacement!  I’m sorry the screen is kind of small, can’t quite see it.  But I have loaded in a Java Script file.  You can see it up here I have a Java Script source from XSSed.net.  XSSed is a group that records sites that have cross-site scripting vulnerabilities and then works with the vendors to correct those.  And so that’s another thing I always do.  I never use my own scripts because I don’t want anybody coming back to me getting mad.  Because if they go to their, there’s an automatic warning.  “If you are a website owner here’s what you need to know,” and they start working with you.

But I can control the page.  I have gotten my script to load.  I now own the page.  Completely, it is mine.  If you’ve down Java Script development you know once you’ve got access to the dom, you own it.  You can do anything you want.  Yes?

Audience 3:  Just a little clarification, this is only going to work if the person who you send that link to?

Paul Gilzow:  Correct, this one.

Audience 3:  This one.  No other users are vulnerable.  You really haven’t affected the webpage at all.  You’ve only affected the webpage for that one user who sees this.

Paul Gilzow:  Exactly, yes.  And that is reflective cross-site scripting.  Reflective is only for the user that goes there.  The website itself, not affected at all.  And again, if you don’t check your logs you’ll never know it happened.  You won’t have any idea.

Audience 3:  Until they call you up and say…

Paul Gilzow:  Until they call you up and say…

Audience 3:  ...given me this.

Paul Gilzow:  Exactly.  Everybody kind of see what’s going with this?  When I’m down to 15 minutes left, can somebody yell?  OK, just want to make sure.

Let’s do now a stored.  Let me launch this.  So this is kind of just a little example, real basic, kind of like Twitter.  You put in a little comment, you hit “leave a comment” and it posts the previous comments.  So this is a persistent demo.  Again, we’re going to do the same thing.  All right, start out with the same little keystrokes, put keywords in here.  Oops!  I need space between there too.  Leave my comment, it shows up down here... 

There we go.  Again, what’s this application doing?  Verbatim again, right?  So what am I most likely going to be able to do?  Inject scripts in there.  So let’s try that.  Did I spell it right?  Yeah, OK.  There we go.  And because it’s a script tag, it doesn’t even show up.  I could probably inject other stuff too.  I could probably put something right after the script tag to make it look like something showed up.  But the real danger here is it is live no matter what.  So if I open up IE, maybe, there we go.  And I go there – boom!  Every user that comes to this page – let me do it in Opera – now it wants to run slow, sorry about that.  There we go.  Same page.  Every user that comes to this page is now going to have that exploit running in there somewhere.

OK, let me give you an even more concrete example of how dangerous this is.  As part of Web Communications, one of our jobs is to evaluate other websites for the different colleges and departments on our campus.  About a year ago the student information systems and – Jason, don’t go back and tell them, my colleagues here, that I’m telling on them.

They came to us, they say, “Hey!  We’re getting ready to launch this degree audit system where our students can go in and they can evaluate how far along they are in their degree.”  They can see what classes they’ve taken, what classes they need and they can interact with their faculty and their advisors about their degree program.  As I was going through it I found a persistent cross-site scripting exploit inside of a student area where the students goes that faculty and advisors had access to.

To bring home the point, what I did is I injected it, had them log in as a faculty member.  When the script ran, I had it take out the page and present a screen back that said, “Your session has timed out.  You need to log back in.  Please enter your paw print and password.”  They entered their paw print and password, hit “submit”, my Java Script catches it, sent it offsite, restores the page and removes the script.  And because it’s persistent, who does it happen to?  Every faculty member and advisor that comes to that page.

And so now what have I done?  I’ve got the usernames and passwords of faculty.  So think about it, an attacker – how easy it is to get a freshman’s username and password?  Simple.  Yeah.  Password = “password”.  It’s simple.  OK, so they don’t have access to anything but they have access to this.  So I get that, I find this exploit, put my injection there, faculty, advisors come in, now I’ve got faculty and advisors’ passwords and now I’ve got access to a whole bunch more.

I’ve got access to the VPN; I’ve got access to the intranet.  I’ve got access other student’s grades.  I’ve got access to all kinds of stuff.  Making sense?  Some of you probably remember, Obama’s website got hit.  Hillary Clinton’s website got hit, all kinds of websites have been hit with this.  But we’re even more dangerous because we have such large user populations, we have federal regulations, FERPA.  If this stuff gets out it’s really bad.

All right, so let me slip back into my presentation.  So what can we do to protect our applications?  And where am I at time wise?  Am I good?  What time is it?  Quarter after?  OK, about 15 minutes?  OK.

Well first thing you need to do is to put on your tin foil hats and you need to be very, very, very, very paranoid – very paranoid.  You can trust no one.  I don’t care where your input is coming from you don’t trust it.  Everything coming into your application is an attack.  That’s how you have to think.  You just have to do it that way because let’s say you’re getting data from a database.  How do you know that data in that database is OK?  How do you know that somebody else hasn’t put data in there from some other application that uses that database?  And now you’re going to get it and then relay it back out to your users.  You can’t trust anything as an input.  Everything is a possible attack.

In addition, you use layers.  Lots and lots and lots of layers.  A military, a defense in depth because what you’re trying to do, you’re not going to stop the attacker.  If an attacker wants to get in, he or she is going to get in eventually.  It’s just a matter of time and energy and money.  You can’t stop it but you can make it really, really hard.  And what you don’t want to be is the low hanging fruit, the easy site to attack because that’s what they want.  They don’t want to put a lot of energy into it.  They just want to get some simple ones and then go to the next site. 

So you’re going to use lots and lots of layers.  Some layers that you can use, one is input filtering, it’s also known as blacklisting.  So you can go in and you can say, “OK, if I see a less than symbol, I’m just going to remove it.”  But what’s the problem with doing that as a single defense? 

Audience 4:  [Inaudible]

Paul Gilzow:  Exactly.  There’s totally legitimate reasons why you might want to allow some of that input.  In addition, how many different character sets are there?  Lots of different character sets.  So if I can somehow manipulate your page to allow UTF-7, well then I can inject UTF-7 less than, and now I’m back to where I was again.

Blacklisting is like chasing your tail, you’re just going to keep going and going and going.  You’re never going to be able to keep up.  It’s OK as one part of the multiple layered systems but this can’t be the only one that you use.

The next one is the one that I stress highly is input validation.  You have to validate that input that you’re receiving.  So if you’re asking for a social security number, what should be in a social security number?  Numbers and – how many numbers?  Nine.  No more, no less, all numbers.  So that’s all you allow.  You say if it’s longer than nine, if the character length is longer than nine spit it out.  If it’s not all numbers, spit it out.  For everything that you can possibly validate, do.

I had a user once who I injected through a drop-down list because you know if you’ve used the Web developer toolbar you can change those drop-down lists to text inputs.  “How did you do that?  I had a drop-down list.”  It doesn’t matter.  You can’t control the client’s side once it’s gone from your server, the user controls it.  So even those you have to validate.  At least with the drop-down list you should have an array or a list of possible values and then compare them against them.  So input validation.

The next one is outputting coding.  So for those symbols that you should allow, maybe like a less than, especially if it’s in like a math thing, when you output that data back to the browser, encode everything into their HTML entities because if it’s an “&gt:” I can’t do anything with that as an attacker because it’s not an actual executable code.  It’s just a symbol.

The next one you can do is called an intrusion detection system.  It’s also known as a Web application firewall.  There was some controversy with those for a while because people thought they were the be all, end all of your solution and they’re not.  They’re just another layer but they are handy.  I do want to show you a quick demo of one so that you have a better idea of what it does.  Let me get back to my Firefox.

Now this one happens to be for PHPIDS.  It’s an intrusion detection system.  And what it’s going to do it’s going to evaluate all the post, all the gets and try to determine if an attack is coming in and then it scores it.  I’m going to put in an attack here real quick so you can just kind of see.  And here it says injection found.  List out the rules and then scores each of the rules that that attack has violated.  Down at the bottom it gives me an overall impact.  And that’s all it does.  It just evaluates the incoming and says, “This is what I found.  These are the rules that it has violated that are common with attacks and here’s a score.”

From there you can do whatever you want.  You can say, “Anything more than 10 is a definite attack and I’m going to block you” as best you can.  Or you can simply store it in a session and say, “Maybe you weren’t really but if you bump over 25 total then I’m going to kick you out.”  Again, it’s just another layer because if an attacker tries to inject and hits this and you’re blocking him, well now, I got to go and get another IP and they come back.  And they’ve got to get through all your other layers.  It’s just more layers on top of each other. 

Something else that you can do is called a tidy output.  Anybody heard of HTML Purifier?  A couple of you?  It simply takes HTML input, evaluates it and then purifies it back to some DOC type.  Give you a quick example there of that.  Bring up my bookmarks again.  There it is.

So let’s say I’ve got divid=blah and then put an end quote, text and then I end it with a paragraph.  This is bad, really bad HTML right?  I didn’t start it with a quote but I’ve ended it with a quote and ending the div with a paragraph instead.  I can say, “All right, I want this to me XHTML one transitional.”  I’ll submit that and HTML Tidy will take it, convert it back to this.

So if you’ve got stuff, if you’ve got characters you need people to enter or you’ve got maybe a section where you’re allowing them to do some basic HTML.  You can use this to at least make sure that what they’ve entered is valid HTML because the double quote in there, I’m trying to break out of that double quote are, it’s not going to allow it.  It’s going to correct that and fix it.  Yes.

Audience 5:  Can you use kind of a...

Paul Gilzow:  Yes.  At least for the PHP one, I know there’s a dot-net one.  I don’t know about any of the other languages but they have an offshoot for dot-net as well.  All right, any other questions?  There’s also AntiSammy.  AntiSammy was born out of the Sammy attack, MySammy attack on MySpace.  It is kind of like the HTML Purifier.  Again, it’s allowing HTML to come in and actually be valid and then making sure there’s no attacks embedded.  Again, another piece that you can add, another layer that you can add to that security.

I actually finished, wow!  Sorry I was a little fast.  I apologize for that.  So we’ve got five minutes left.  Five minutes, so five questions.  Are there any questions?  Yes.

Audience 6:  [Inaudible]

Paul Gilzow:  Lots of regular expressions, tidy the output.  It’s hard.  That is a hard one when you are trying to allow them to do actual HTML.  And that’s why the BBCode was so popular because then you can convert it and you can kind of control it.  That’s even injectable but it’s not easy.  It is difficult.  You just have to make sure that what they’re putting in is what you want to allow them to do.  And I’d definitely check out AntiSammy because that is specifically what that is designed to do.

Audience 6:  [Inaudible]

Paul Gilzow:  Yes.

Audience 6:  [Inaudible]

Paul Gilzow:  And that is exactly why they do it that way because it’s easier to say, “If I’m going to format my HTML this way then I know what I should be getting and then I can convert it over to HTML.”  You’ve got a question?

Audience 7:  [Inaudible]

Paul Gilzow:  Good question.  I don’t know about log files because I’m not a sys admin.  There are tools though that you can use.  One is the XSS Me.  What this will do is it’ll run a full collection of cross-site scripting attacks against every form possible in a page and then give you those results.  It’s a little Firefox extension.  There’s also something called HackBar.  So not only will it do cross-site scripting, it also do MS SQL, MySQL, it can also do encodings so that you can attack – again, you’re not supposed to attack anybody else’s sites, only your sites with your system admin approval.  Yes.

Audience 8:  What day were you doing that?

Paul Gilzow:  What day was I doing what?

Audience 8:  When you were looking so I can look at my logs.

Paul Gilzow:  It was over the last two weeks essentially.  Again, if I found one, I’ll contact you for sure. 

Audience 9:  [Inaudible]

Paul Gilzow:  I know there are tools to search in logs.  I just don’t do system administration so I don’t usually have to go back over the logs.

Audience 10: [Inaudible]

Paul Gilzow:  That’s true too.

Audience 10:  Push it out to the end...

Paul Gilzow:  Yeah.  I’m not real well versed in the logs because luckily enough I don’t have to mess with any system administration.

Audience 11:  [Inaudible]

Paul Gilzow:  Yes.  Exactly.  SQL injection is even worse and the same strings that you saw me using is used in SQL injection because you usually do your qualifiers with double quotes or single quotes.  But at least with SQL servers, most of those now they’ve got some type of binding or encapsulation.  The vendors have tried to lessen that attack but they’re still extremely dangerous.  In fact, I found one site – I won’t say who it was – that actually dumped out the SQL statements in their comments as you ran searches.  So now, you’ve got table names, field names, all that stuff.

Audience 12:  [Inaudible]

Paul Gilzow:  Yeah, click jack is a whole other thing.  You do cross-site scripting and you lay another div on top of a page.  You make the bottom page translucent and then put things for the user to click on but they’re actually clicking on the stuff below.

Audience 12:  [Inaudible]

Paul Gilzow:  Yeah.  So you can bring up their bank site in the background, they click “transfer funds” even though they think they’re clicking something else.  And they hit “submit” and they think they’re posting a comment on Twitter but they’re actually just transferring money to you.

Audience 13:  [Inaudible]

Paul Gilzow:  Yeah, script jacking is fun too.

Audience 14:  [Inaudible]

Paul Gilzow:  Web server firewalls.  Again, you can use it as a layer but I wouldn’t depend on it 100%.  It depends on the Web server’s firewall because it doesn’t analyze the SSL traffic does it?  It does?  OK.  It’s just another layer that you can use.  I wouldn’t trust it because again, you’re never going to fully stop somebody from attacking you and getting through if they are determined.  You can only make it cumbersome.  Yes.

Audience 15:  This question has already been asked.  I’m sorry, what is the most... at this point in time?  What is going to be?

Paul Gilzow:  I would say output encoding.  If I had to just name one, if you could only do one today, I’d say output encoding because at least then you’re limiting a big bulk.

Audience 15:  How does that work in PHP?

Paul Gilzow:  In PHP, it’s just a function.  You say take my string and then convert it into HTML entities.  I don’t remember what it is in dot-net off the top of my head but…

Audience 15:  So anything...?

Paul Gilzow:  Yes.

Audience 16:  [Inaudible]

Paul Gilzow:  Correct.

Audience 17:  HTML.

Paul Gilzow:  HTMLencode.net.  Another thing is if you’re going to the WordPress thing today, I love open sourced stuff.  I do, I really do.  I really, really do but some of that stuff, especially WordPress, is really bad about these injections.  And I don’t want to ding against them but you’ve got people who aren’t security experts necessarily programming their stuff and they haven’t thought about these things.  So if you install WordPress and then start plugging in all these different plug-ins and modules and not checking the code, you have no idea what you just allowed.

Audience 17:  That’s more.

Paul Gilzow:  At 2.6.2, it’s solid right now but 2.6.1 had a remote injection vulnerability where I can take over the admin password and then lock everybody out.  I’m not saying it’s a bad product just make sure you stay on top of the updates.  I mean WordPress MU, which is supposed to be a little bit more secure, just came out with a whole bunch of cross-site scripting vulnerabilities.  One of the problems with open source is that all the attackers can open it up and take a look.  Again, it’s a great product to use just make sure you stay on top of the updates and you harden it up as much as possible.

Host:  That’s all we have time for today.

Paul Gilzow:  Oh, that’s all we got.  Thank you.

[Applause]

Announcer:  For more presentations from the 2008 HighEdWeb Conference visit HighEdWeb.org/2008 or sign up for our podcast and feed at HighEdWeb.org/podcast.xml

[End of Music]