Understanding the Real-World Performance of your Web Application Across IE11 and Other Browsers


In this IE Blog article I talk about how to use the Timing APIs to understand the real-world performance of your web applications.

Together with Google, Mozilla, Microsoft, and other community leaders, the W3C Web Performance working group has standardized the Navigation Timing, Resource Timing, User Timing, and Performance Timeline interfaces to help you understand the real-world performance of navigating, fetching resources, and running scripts in your Web application. You can use these interfaces to capture and analyze how your real-world customers are experiencing your Web application, instead of relying on synthetic testing, which tests the performance of your application in an artificial environment. With this timing data, you can identify opportunities to improve the real-world performance of your Web applications. All of these interfaces are supported in IE11. Check out the Performance Timing Test Drive to see these interfaces in action.

The Performance Timing testdrive lets you try out the Timing APIs.
The Performance Timing Test Drive lets you try out the Timing APIs.

Performance Timeline

The Performance Timeline specification has been published as a W3C Recommendation and is fully supported in IE11 and Chrome 30. Using this interface, you can get an end to end view of time spent during navigating, fetching resources, and executing scripts running in your application. This specification defines both the minimum attributes all performance metrics need to implement and the interfaces developers can use to retrieve any type of performance metric.

All performance metrics must support the following four attributes:

  • name. This attribute stores a unique identifier for the performance metric. For example, for a resource, it will be the resolved URL of the resource.
  • entryType. This attribute stores the type of performance metric. For example, a metric for a resource would be stored as “resource.”
  • startTime. This attribute stores the first recorded timestamp of the performance metric.
  • duration. This attribute stores the end-to-end duration of the event being recorded by the metric.

All of the timing data is recorded in high resolution time using the type DOMHighResTimeStamps, defined in the High Resolution Time specification. Unlike DOMTimeStamps which measure time values in milliseconds from 01 January, 1970 UTC, the high resolution time value is measured in at least microsecond resolution from the start of navigation of the document. For example, if I check the current time using performance.now(), the high resolution time analogous for Date.now(), I would get the following interpretation of the current time:

> performance.now();

4038.2319370044793

 

> Date.now()

1386797626201

This time value also has the benefit of not being impacted by clock skew or adjustments. You can explore the What Time Is It Test Drive to understand the use of high resolution time.

You can use the following interfaces to retrieve a list of the performance metrics recorded at the time of the call. Using the startTime and duration, as well as any other attributes provided by the metric, you can obtain an end-to-end timeline view of your page performance as your customers had experienced.

PerformanceEntryList getEntries();

PerformanceEntryList getEntriesByType(DOMString entryType);

PerformanceEntryList getEntriesByName(DOMString name, optional DOMString entryType);

The getEntries method returns a list of all of the performance metrics on the page, whereas the other methods return specific items based on the name or type. We expect most developers will just use JSON stringify on the entire list of metrics and send the results to their server for analysis rather than processing the information on the client.

Let’s take a closer look at each of the different performance metrics: navigation, resource, marks, and measures.

Navigation Timing

The Navigation Timing interfaces provide accurate timing measurements for each of the phases of navigating to your Web application. The Navigation Timing L1 specification has been published as a W3C Recommendation, with full support since IE9, Chrome 28, and Firefox 23. The Navigation Timing L2 specification is a First Public Working Draft and is supported by IE11.

With Navigation Timing, developers can not only get the accurate end-to-end page load time, including the time it takes to get the page from the server, but also get the breakdown of where that time was spent in each of the networking and DOM processing phases: unload, redirect, app cache, DNS, TCP, request, response, DOM processing, and the load event. The script below uses Navigation Timing L2 to get this detailed information. The entry type for this metric is “navigation,” while the name is “document.” Check out a demo of Navigation Timing on the IE Test Drive site.

<!DOCTYPE html>

<html>

<head></head>

<body>

<script>

 functionsendNavigationTiming() {

   var nt = performance.getEntriesByType(‘navigation’)[0];

   var navigation = ‘Start Time: ‘ + nt.startTime;

   navigation += ‘Duration: ‘ + nt.duration;

   navigation += ‘Unload: ‘ + (nt.unloadEventEnd – nt.unloadEventStart);

   navigation += ‘Redirect: ‘ + (nt.redirectEnd – nt.redirectStart);

   navigation += ‘App Cache: ‘ + (nt. domainLookupStart – nt.fetchStart);

   navigation += ‘DNS: ‘ + (nt.domainLookupEnd – nt.domainLookupStart);

   navigation += ‘TCP: ‘ + (nt.connectEnd – nt.connectStart);

   navigation += ‘Request: ‘ + (nt.responseStart – nt.requestStart);

   navigation += ‘Response: ‘ + (nt.responseEnd – nt.responseStart);

   navigation += ‘Processing: ‘ + (nt.domComplete – nt.domLoading);

   navigation += ‘Load Event: ‘ + (nt.loadEventEnd – nt.loadEventStart);

   sendAnalytics(navigation);

 }

</script>

</body>

</html>

By looking at the detailed time spent in each of the network phases, you can better diagnose and fix your performance issues. For example, you may consider not using a redirection if you find redirect time is high, use a DNS caching service if DNS time is high, use a CDN closer to your users if request time is high, or GZip your content if response time is high. Check out this video for tips and tricks on improving your network performance.

The main difference between the two Navigation Timing specification versions is in how timing data is accessed and in how time is measured. The L1 interface defines these attributes under the performance.timing object and in milliseconds since 01 January, 1970. The L2 interface allows the same attributes to be retrieved using the Performance Timeline methods, enables them to be more easily placed in a timeline view, and records them with high resolution timers.

Prior to Navigation Timing, developers would commonly try to measure the page load performance by writing JavaScript in the head of the document, like the below code sample. Check out a demo of this technique on the IE Test Drive site.

<!DOCTYPE html>

<html>

<head>

<script>

 var start = Date.now();

 

 function sendPageLoad() {

   var now = Date.now();

   var latency = now – start;

   sendAnalytics(‘Page Load Time: ‘ + latency);

 }

 

</script>

</head>

<body onload=’sendPageLoad()’>

</body>

</html>

However, this technique does not accurately measure page load performance, because it does not include the time it takes to get the page from the server. Additionally, running JavaScript in the head of the document is generally a poor performance pattern.

Resource Timing

Resource Timing provides accurate timing information on fetching resources in the page. Similar to Navigation Timing, Resource Timing provides detailed timing information on the redirect, DNS, TCP, request, and response phases of the fetched resources. The Resource Timing specification has been published as a W3C Candidate Recommendation with support since IE10 and Chrome 30.

The following sample code uses the getEntriesByType method to obtain all resources initiated by the element. The entry type for resources is “resource,” and the name will be the resolved URL of the resource. Check out a demo of Resource Timing on the IE Test Drive site.

<!DOCTYPE html>

<html>

<head></head>

<body onload=’sendResourceTiming()’>

<img src=’http://some-server/image1.png’&gt;

<img src=’http://some-server/image2.png’&gt;

<script>

 function sendResourceTiming()

 {

  var resourceList = window.performance.getEntriesByType(‘resource’);

  for (i = 0; i < resourceList.length; i++)

  {

   if(resourceList[i].initiatorType == ‘img’)

   {

    sendAnalytics(‘Image Fetch Time: ‘ + resourceList[i].duration);

   }

  }

 }

</script>

</body>

</html>

For security purposes, cross-origin resources only show their start time and duration; the detailed timing attributes are set to zero. This helps avoid issues of statistical fingerprinting, where someone can try to determine your membership in an organization by confirming whether a resource is in your cache by looking at the detailed network time. The cross origin server can send the timing-allow-origin HTTP header if it wants to share timing data with you.

User Timing

User Timing provides detailed timing information on the execution of scripts in your application, complementing Navigation Timing and Resource Timing which provide detailed network timing information. User Timing allows you to display your script timing information in the same timeline view as your network timing data to get an end to end understanding of your app performance. The User Timing specification has been published as a W3C Recommendation, with support since IE10 and Chrome 30.

The User Timing interface defines two metrics used to measure script timing: marks and measures. A mark represents a high resolution time stamp at a given point in time during your script execution. A measure represents the difference between two marks.

The following methods can be used to create marks and measures:

void mark(DOMString markName);

void measure(DOMString measureName, optional DOMString startMark, optional DOMString endMark);

Once you have added marks and measures to your script, you can retrieve the timing data by using the getEntry, getEntryByType, or getEntryByName methods. The mark entry type is “mark,” and the measure entry type is “measure.”

The following sample code uses the mark and measure methods to measure the amount of time it takes to execute the doTask1() and doTask2() methods. Check out a demo of User Timing on the IE Test Drive site.

<!DOCTYPE html>

<html>

<head></head>

<body onload=’doWork()’>

<script>

 function doWork()

 {

  performance.mark(‘markStartTask1′);

  doTask1();

  performance.mark(‘markEndTask1′);

  performance.mark(‘markStartTask2′);

  doTask2();

  performance.mark(‘markEndTask2′);

 

  performance.measure(‘measureTask1′, ‘markStartTask1′, ‘markEndTask1′);

  performance.measure(‘measureTask2′, ‘markStartTask2′, ‘markEndTask2′);

  sendUserTiming(performance.getEntries());

 }

</script>

</body>

</html>

We want to thank everyone in the W3C Web Performance Working Group for helping design these interfaces and browsers vendors for working on implementing this interface with an eye towards interoperability. With these interfaces, Web developers can truly start to measure and understand what they can do to improve the performance of their applications.

Try out the performance measurement interfaces with your Web apps in IE11, and as always, we look forward to your feedback via Connect.

Thanks,

Jatinder Mann, Internet Explorer Program Manager

25 HTML Speed Tips

net244feathtml5

From minifying your JavaScript to using image sprites, here are 25 top tips for high performance web apps. My article on 25 HTML5 speed tips originally appeared in .Net Magazine Issue 244.

For the past few years I've been part of the IE team at Microsoft looking into ways to improve the online experience. Along the way we've learned a lot about web performance and developed an in-depth understanding of what goes into making sites and apps fast.

Creating high-performance web applications is essential for every web developer, whether we're talking about websites that run on a standards-based web browser or apps for the Windows Store. The goal of the developer is to improve web performance by reducing the following factors:

Display time

The most important objective is what we refer to as 'Display Time'. This has many names across the industry including 'time to glass' and 'primary paint'. Display Time measures the time from when the user performs an action until the user sees the result of that action on the screen. This is the duration from when the user navigates to the site until when the site is visually complete loading.

Elapsed time

Most sites continue to perform work in response to the user action after the content has been displayed to the screen. This may include downloading user data (such as email messaging) or sending analytics back to a provider. From users' perspective, the site may appear loaded. However, significant work is often occurring in the background, which impacts responsiveness.

CPU time

Web browsers are almost exclusively limited by the CPU; the work a browser performs on the CPU and how efficiently that work occurs will make the single largest impact on performance. That's why offloading work to the GPU has made such a significant impact to IE9 and IE10 performance. The amount of CPU time required to perform the action and the CPU efficiency are critical.

Resource utilization

Building a fast browser means ensuring resources across the entire PC work well together. This includes network utilization, memory usage patterns, GPU processing, graphics, memory and hundreds of other dimensions. Since customers run several applications at the same time on their PC, it's important for browsers to responsibly share these resources with other applications.

Power consumption

When utilizing underlying PC hardware, it's important to take power consumption into consideration. The more efficiently a browser uses power, the longer batteries will last in mobile scenarios, the lower the electricity costs for operating the device, and the smaller the environmental impact. Power and performance are complementary goals.

I want to share with you some of the things that I've learned about developing faster websites and apps, and the changes you can make today to improve performance.

There are seven key principles that developers can take into consideration:

  • Respond quickly to network requests
  • Minimize bytes to download
  • Efficiently structure markup
  • Optimize media usage
  • Write fast JavaScript
  • Render in standards mode
  • Monitor what your app is doing

Within these seven principles I've included a number of helpful performance tips below that will make your HTML5 websites and apps run faster (watch a video of a presentation I made for even more HTML5 tips).

Respond quickly to network requests

01. Prevent 3xx redirections

When a user clicks on a link, they expect to receive content as quickly as possible. 3xx redirections can create a 250 millisecond delay in an application. This may seem like a short delay, but it's roughly 10 per cent of page load time. Around 63 per cent of the world's top websites contain 3xx redirections.

02. Use content distribution networks (CDNs)

CDN

With CDNs you can easily geographically locate your data closer to your user. Services such as Azure help with this and can help to reduce the amount of time that content spends travelling between locations. On today's networks, that can save up to as much as 300 milliseconds.

03. Maximize concurrent connections

When people think about a network, they often think of a single pipeline conducting content back and forth. In fact, the browser can make six concurrent connections at a given time, enabling it to download six resources at once. This is possible across multiple domains: by distributing your content you can further increase the number of resources you can download simultaneously. If your website holds images across six or seven domains, you can significantly reduce the amount of time the page takes to load.

04. Understand your network timing

Understand the breakdown of your network timing – navigation timing, resource timing and user timing – and use standards-based APIs available in modern browsers such as IE10, and in Windows Store apps. You can get higher resolution timing information on the navigation of your document. Navigation timing allows you to understand the amount of time your application spends in various phases.

timing

Minimize bytes downloaded

05. Download fewer resources and bytes

The fewer resources you can download the better. Look at the resources you are downloading and work out where you can cut down. The average website today downloads 777kB of data. The vast majority of these bytes are taken up by images, followed by JavaScript and Flash content.

06. Gzip: compress network traffic

The best way to download fewer bytes is to Gzip your content. Most people will get this service for free because of the servers they are using, but many unintentionally turn off this decoding technique.

07. Standard file capitalization

This often catches people by surprise, but the server will pick up on variations in upper/ lower case. The following download requests are two variations of the same request:

Lower case

  1. <img src="icon.png"/>

Title case

  1. <img src="Icon.png"/>

From the web platform perspective, these are two different files. Therefore, two different network requests will be processed as a result.

Efficiently structure markup

08. Avoid quirks mode

Always use a standards-based doctype to avoid quirks mode. Start with <!DOCTYPE html>. The modern web has no place for quirks mode, which was designed so that mid-90s web pages would be usable in turn-of-the-century 'modern' browsers like IE6 and Firefox 2.

Most web pages today end up in quirks mode accidentally, as a consequence of an invalid doctype or extraneous text before the doctype. This can cause strange layout issues that are hard to debug.

09. Avoid inline JavaScript events

Take care that you don't use inline JavaScript events inside your HTML markup. One example of this would be <button onclick="validate()">Validate</button>. This is a practice that breaks the clean separation that should exist between the markup, presentation and the behaviour.

Also, if your scripts are loading at the bottom of the file, it's possible for a user to interact with the page, trigger an event and attempt to call a script that hasn't loaded yet, which will cause an error.

10. Link style sheets at the top of the page

By placing the CSS at the top of the page, the browser will issue that request first and simultaneously block painting until CSS is complete. By placing CSS in the head, images, JavaScript and other time intensive resources can be downloaded later. To that end, we would also advise against linking your CSS at the bottom of the page.

11. Only include the necessary styles

It may seem like a good idea to hold one very large CSS file that's shared across the entirety of your website. Some of the top news sites in the world use this approach today. One news site, for example, has one style sheet with 4,000 rules in it, of which only 5-10 per cent are used in a single page. The browser has to download all these styles, pass them and create internal data structures, most of which will never be used.

12. Link your JavaScript at the bottom of the page

This is common best practice. You should always make sure that the styles and the visuals are downloaded first. That way, the JavaScript can follow later and manipulate the page how it likes. However, if you absolutely have to link the JavaScript in the header, as determined by the CRM system or hosting service you are using, then be sure to use the defer tag.

13. Understand the limits of HTML5 tags

New HTML5 tags like <section>, <header>, and <footer> improve the semantics of markup, but require a special shiv script to run in Internet Explorer 6, 7, and 8, or they won't be recognised. Pages that need to work with these legacy browsers, even when scripts are disabled, cannot use the new HTML5 tags. Using plain <div> elements and classes is often a safer course of action for those cases.

14. Standardize on a single framework

There are a lot of JavaScript frameworks out there and a many of them do the same things in terms of functionality. The browser has to download the JavaScript, pass it and create internal data structures without knowing whether or not you will execute them. By sticking to one, single framework, you will significantly improve your performance.

15. Don't include scripts to be cool

There are a lot of 'cool' scripts out there. It's really easy to include a script file that does something 'cool' in the background, but that script file is competing with resources in your page load. Are you sure it's necessary?

Optimize media usage

16. Minimize the number of images

media

The majority of bytes downloaded are for images. The average number of images on the top 100,000 websites is 58. When you go over 20-30 images, you'll start to see a performance impact. Take a look at all your resources and ask yourself whether you need them or not.

17. Use image sprites

Image sprites can significantly reduce the amount of data that needs to be downloaded. Where possible create image sprites by hand.

sprites

18. Consider which image formats you use

PNG offers the most efficient balance between compatibility, coding size, CPU decoding time and the bytes required for CPU decoding. It also has one of the best compression rates. However, for photographs, JPEG tends to be the better format.

19. Avoid complex SVG paths

Complex SVG paths take the browser a long time to download, pass and construct internal data structures. Where possible, try to generate the most concise SVG path you can. It may take a while, but generating the SVG path by hand really can help.

svg

20. Specify an image preview for HTML5 video

If you don't specify an image preview, the browser has to download the video, figure out what the first frame is, and then display that to the user. By specifying an image preview the browser doesn't need to download the video, only the image. It only starts downloading the video when the user requests it.

preview

21. Minimize media plug-in usage

Despite perceptions, most media plugins aren't as fast as HTML5 at running video. Furthermore, they'll compete with the application for resources.

Write fast JavaScript

22. Minify your JavaScript

Many people will be familiar with this technique. Essentially, you take your JavaScript, remove some characters, and then simplify the variables. As demonstrated below:

Initial (66 characters)

  1. function Sum(number1, number2) {
  2. return (number1 + number 2);
  3. }

Characters removed (54 characters)

  1. function Sum(number1,number2){return
  2. number1+number2;}

Compacted (30 characters)

  1. function Sum(a,b){return a+b;}

You're left with less JS to download, and there's a hidden benefit: runtime performance. With tighter code and smaller variable names, the runtime can look up variables faster, improving download as well as runtime performance.

js

In addition, initialize JavaScript on demand. Don't load your entire JavaScript library during the page load – you can do it dynamically when you need to.

Render in standards mode

23. Use web standards

Not only does using web standards help reduce the cost of development and the complexity of testing across browsers and devices, but it can also achieve some noticeable performance benefits. In fact, we found that sites in IE10 got an average of 30 per cent better page load time when they switched to standards mode. The benefit is similar in IE9 too.

standardsmode

Standards mode is the default rendering mode of all browsers and offers the best implementation of web standards that work the same in all browsers. In addition to standards mode, IE provides compatibility modes to keep websites designed for old versions of IE working. This was particularly helpful in the past when it was a common practice to first detect a browser and then serve code meant only for it. But this is no longer necessary in many cases because the web standards code is displayed similarly in modern browsers, including IE10 and 9, through standards mode. This new practice is commonly known as feature detection and was made popular by Modernizr).

Monitor what your application is doing

24. Combine application timers

Most applications on the web today have 5-10 timers that are running at all times. About half of those timers are dormant. Make sure timers or sequences aren't running unnecessarily. When possible, combine your application timers and, if you can, create one timer that manages all your sequences.

25. Check your app's visibility status

Without knowing the visibility status of your application, you're forced to design it as always visible. Page visibility is a new standards-based API supported in IE10 and Windows Store apps, as well as most modern browsers.

Page visibility allows you to determine the visibility status of the application, so you can reduce activity when you know the application is not visible and, in doing so, save CPU time and increase battery life.

Conclusion

Better site performance can have a big impact on your users. Slow web page loading is a major factor in users abandoning a site, can reduce perceived site credibility and affect product sales. The time savings of some of the tips above may seem relatively small, but, when implemented together, they can create considerable time savings. What's more, they will encourage you to think efficiency first and create apps and sites with the user experience at the forefront.

Performance Matters

I wrote this column on A List Apart on why performance matters and how we can accomplish a lot when working together with industry and community leaders in the W3C.

While a decade ago it may have been okay to go brew a pot of coffee while the computer was loading, today we all expect our software and devices to be fast and responsive. The same holds true for web performance. The load speed and responsiveness of a web application play a critical role in our choice of which applications we want to use. At the same time, how efficiently software runs plays a critical role in device battery life. No one wants to be carrying around a brick.

To build fast web applications, web developers need to be able to qualitatively measure application performance, effectively use their hardware, and most importantly, know which patterns to optimize and which to avoid.

We found that web developers haven’t had the right tools. Left on their own to guess, web developers in many cases only emphasized JavaScript performance and equated JavaScript performance with web performance.

However, web performance is truly a multi-dimensional problem. How quickly the network can download resources, how efficiently the CPU can perform web runtime operations, and the amount of available GPU memory all impact web performance. Let’s take a closer look at an example I gave in a Build presentation involving five real world travel booking web applications: Kayak, Expedia, Priceline, Travelocity, and Orbitz. All of these sites have very similar elements on the page. They all have logos, banner ads, interactive real time flight data, etc. We would expect them all to have very similar performance characteristics.

Figure 1 shows the metrics for each of these sites, anonymized here to avoid poking at anyone. We can see that even though these travel sites have very similar functions, they have all been designed very differently. Some have more bytes coming down the wire, others have more JavaScript.

  Total Size (k) Number of Elements CSS Rules Image Files Script Lines
Site #1 3,697 1,504 1,392 41 77,768
Site #2 2,278 1,100 5,325 29 39,183
Site #3 1,061 2,673 1,105 66 12,643
Site #4 1,812 4,252 1,672 12 10,284
Site #5 1,372 900 3,902 6 38,269

Many developers would assume that the fastest site would be the one with the least number of formatted lines of JavaScript, like Site #4, or the one with the least bytes downloaded, like Site #3. However, that’s not the case. Site #5 is actually the fastest, even though it has more JavaScript and bytes downloaded. Figure 2 breaks down the amount of time each of these sites spent in the different browser subsystems, subsystems all standards-based web browsers have.


Bar chart showing browser subsystem load times in milliseconds for five sites.

Time spent in browser subsystems to load top five travel sites.

There are really two points here. It’s not just about how to most efficiently execute JavaScript, it’s about how all of the browser subsystems can most effectively work together. Without the right tools, developers don’t really know why their applications are slow and what to optimize.

Most web experts realized that there was a need for better interfaces to help developers measure their applications and write faster code. More importantly, there was a need for a forum to talk about performance problems that web developers faced and how to solve them.

At tech conferences, folks like Steve Souders and Arvind Jain from Google, Jason Weber from Microsoft, Jason Sobel from Facebook, and others started discussing performance-related issues, such as an accurate and interoperable way to measure page navigation time. These conversations soon led to the establishment of a working group in the W3C focused solely on performance. It was in the summer of 2010 that Jason Weber from Microsoft and Arvind Jain from Google were selected to co-chair the newly minted W3C Web Performance Working Group, chartered to identify and solve performance issues that developers encountered on the web.

Not everyone was certain all these competitors could work together. The first comment on the IE Blog post announcing this working group, referring to the Google and Microsoft co-chairing of the working group, was “I bet the Richter scale could pick up the tension in that room.” Fast forward to 2013 (and ignoring the fact that the Richter scale isn’t an actual instrument), and the Web Performance Working Group can really be seen as an example of the ideal working group. In just three years, Microsoft, Google, Mozilla, Intel, DynaTrace, and others in this group have designed APIs to help developers measure their web applications more accurately and precisely than ever before (Navigation Timing, Resource Timing, User Timing, Performance Timeline, High Resolution Time), and more efficiently schedule activities in web applications (requestAnimationFrame, Page Visibility, Efficient Script Yielding). Based on the conversations from the W3C Workshop on Web Performance, which this working group put together to hear from 45 performance experts and web developers from 21 organizations, the working group is now actively working on six new specifications to solve other performance problems: efficiently prioritizing the download of resources (Resource Priorities), asynchronously transferring data without blocking page unloads (Beacon), getting detailed network error and availability information (Navigation Error Logging, Resource Error Logging), marking links to be instantly loaded (Prerender), standardizing on a performance metric format (HAR—HttpArchive), and improving existing interfaces (Navigation Timing L2, High Resolution Time L2). The High Resolution Time specification went from an idea to fully implemented by all major browser vendors and fully standardized in just 11 months. That must be some sort of a record!

Looking back, the W3C was the perfect forum to bring together the top browser vendors, web properties, and performance experts to discuss and try to solve major performance issues. As web developers design more efficient web applications and browser vendors design runtimes that can execute those web applications more efficiently, we will all benefit from a faster and more responsive web browsing experience.

Beyond the Web Performance Working Group, the W3C has plans to work with the community to continue to enhance browsers, develop tools, improve education, and promote performance in W3C specifications.

Let’s keep innovating. Follow the performance conversation in the Web Performance Working Group!

Using Hardware to Decode and Load JPG Images up to 45% faster in Internet Explorer 11

In this IE Blog article, I talk about how IE11 uses hardware more efficiently to decode and load JPG images up to 45% faster!

Internet Explorer 11 and Windows Store Apps on Windows 8.1 offload parts of the image decoding pipeline to the graphics hardware, resulting in up to 45% faster image load, up to 40% lower memory consumption, and improved battery life. Images on average account for the most bytes downloaded on the Web today. To improve the performance of loading JPG images, IE11 has streamlined its JPG decoding by moving some steps of the decoding process directly to the GPU where it can be done significantly faster and in parallel.

JPG Image Format

Images account for 61% of bytes downloaded on the Web today, and 47% of image requests are for JPG images. By using hardware more efficiently to decode JPG images, IE11 now loads JPG images up to 45% faster and uses up to 40% less memory over previous IE versions.

To understand these performance improvements, let’s start by looking at how JPG images are typically encoded. The first step of JPG encoding is to convert the bitmap from the RGB color space to YCbCr color space.

JPG encoding process.

RGB defines color in three color components: red, green, and blue. If we break down this image of the Grand Tetons in to its RGB color channels, you can see that there is a high degree of detail in each of the channels. The amount of memory required by IE for an image in the RGB color model can be calculated by multiplying image width x image height x 32 bits.

RBC Channels of a JPG image

Image of the Grand Tetons with its Red, Blue, and Green components

YCbCr (commonly referred to as YUV color space) defines a color space in terms of one luma (Y) and two chroma (CbCr) components. The luma component represents the brightness information of a color. The chroma components contain the color differences, with Cb containing the blue difference and Cr containing the red difference. The figure below shows the same image of the Grand Tetons broken down into its Y, Cb, and Cr channels.

YCbCr channels of JPG image

Image of the Grand Tetons with its Y, Cb, Cr channels

The next step in the encoding process is to compress the image size using a lossy compression known as chroma subsampling. The chroma channels can be significantly compressed because the human eye is more sensitive to the brightness of an image and less sensitive to color or hue. For example, looking at the Cb and Cr channels, the last two on the right, you can see that there is very little information contained in these channels relative to the Y channel. The level of subsampling is typically expressed using a three part ratio, e.g., 4:2:0. The first part of the ratio represents a horizontal sampling reference, and the second two parts represent the number of chrominance samples in the first and second rows of that sampling reference. Chroma is typically subsampled at one of three levels:

  • 4:4:4, where the chroma data isn’t subsampled
  • 4:2:2, where the chroma data is downscaled horizontally by 2,
  • 4:2:0, where the chroma data is downscaled horizontally and vertically by 2.

A 4:2:0 subsampled YCbCr JPG image can use up to 62.5% less memory than the original RGB bitmap! Most JPG images used on the Web today are already in 4:2:2 or 4:2:0 chroma subsampling modes. Most image processing tools will automatically subsample to 4:2:2 or 4:2:0 when using the ‘Save for Web’ option.

Once the image has been subsampled, the discrete cosine transformation, quantization, and Huffman encoding processes are also applied to get the final encoded JPG image.

More Efficiently Using Hardware in IE11

To decode a JPG image, you run the same encoding steps in reverse. Traditionally, IE had always decoded JPG images into RGB bitmaps by running all of the below decoding steps directly on the CPU. At render time, IE would copy the RGB bitmap to the GPU for rendering.

2727.Decode

To more efficiently use the hardware, IE11 now splits the JPG decoding work between the CPU and GPU. IE11 decodes the JPG image into the chroma subsampled YCbCr color space on the CPU, but then does the chroma upsampling and YCbCr to RGB color conversion steps on the GPU at draw time, where it can happen much faster and in parallel. This process frees CPU time to perform other operations, as the CPU is a common bottleneck in modern sites and apps. In addition to the decode time improvements, copying the much smaller YCbCr image to the GPU reduces the amount of memory that is copied and stored on the GPU (a limited resource). Using less CPU and memory also reduces power consumption and increases data locality.

The below CPU chart shows the amount of time it takes to decode and draw a 4:2:0 subsampled YCbCr JPG version of the Grand Tetons image on IE10 on Windows 8. Image decoding took 31.9ms, and total time to draw the image was 81.5ms.

4477.uhtdjiiie-image5_760x169

If we load the same JPG image on IE11 on Windows 8.1, you can see that the decode time is now only 17.9ms, which is an improvement of 44%! The total time to draw the image is now 57.5ms, which is 30% faster.

6507.uhtdjiiie-image6_760x170

Because most JPG images on the Web today are in YCbCr format, IE11 on Windows 8.1 users will experience these improvements automatically. We recommend that developers ensure their JPG images are compressed in 4:2:2 or 4:2:0 chroma subsampling modes to maximize the performance benefits of hardware accelerating the JPG decoding pipeline.

Summary

By using the hardware more efficiently to decode JPG images, IE11 improves the performance of nearly every page you browse, while also improving the power consumption and battery life of your device. Please install the Windows 8.1 Preview from the Windows Store and try IE11. As always, we welcome your feedback, either through the IE11 Send Feedback tool or on Connect.

Jatinder Mann, Internet Explorer Program Manager

Web Performance APIs Rapidly Become W3C Recommendations

In this IE Blog post article, I discuss how the W3C Web Performance APIS are rapidly becoming W3C Recommendations with interoperable support for all major web browsers.

The W3C Web Performance Working Group recently published three specifications as W3C Recommendations with full implementations from all major browser vendors, advancing developers’ ability to accurately measure the performance of Web applications and make the Web faster. Over the last three years, companies including Microsoft, Google, Mozilla, Intel, Facebook, and others have been working towards standardizing the Navigation Timing, High Resolution Time, and Page Visibility interfaces in the Working Group. Rapid adoption of these recommendations demonstrates what’s possible when the industry and community come together through the W3C.

To make the Web faster, developers need the ability to accurately measure the performance characteristics of Web applications and the ability to effectively use the underlying hardware to improve the performance of their applications. To solve these problems, the Web Performance Working Group worked on 15 different specifications that address those issues. The table below shows the maturity level of all the specifications currently edited by the Working Group.

Web Performance Spec Status 5_22_2013

The Navigation Timing, Resource Timing, User Timing, and Performance Timeline specifications help developers accurately measure the timing of the navigation of the document, fetching of resources on the page, and developer script execution. Prior to these APIs, this data wasn’t easily obtainable. Navigation Timing was published as a W3C Recommendation, and all major browser vendors support it. The other three interfaces are currently at the Candidate Recommendation stage awaiting two full implementations from browser vendors. IE10 is currently the only browser that implements all of these interfaces, however, other vendors are working on implementations.

To ensure these performance metrics are measured in the most accurate way possible, the High Resolution Time specification allows developers to measure operations with sub-millisecond accuracy. This interface not only benefits accurate measurements of performance metrics, but also allows better frame rate calculations and synchronization of animations or audio cues. This interface has been published as a W3C Recommendation, with all major browser vendors implementing the performance.now() method defined in the specification.

The Page Visibility API allows for programmatically determining the current visibility state of the page. Developers can use this data to make better CPU- and power-efficiency decisions, e.g., throttling down activity when the page is in the background tab. This specification has also been published as a W3C Recommendation, with all major browser vendors implementing it.

The Timing Control for Script-Based Animations, and Efficient Script Yielding specifications help developers write more CPU- and power-efficient Web applications. The requestAnimationFrame API, from the Timing Control for Script-Based Animations specification, allows for creating more efficient JavaScript animations. All browser vendors fully support this interface, with the Working Group actively working on publishing this specification as a Candidate Recommendation. The setImmediate API, from the Efficient Script Yielding specification, allows developers to efficiently yield control flow to the user agent and receive an immediate callback, efficiently leveraging the CPU. IE10 is the first browser to implement this interface.

This year the Working Group also started to look at new ideas, with editor’s drafts of those ideas currently being discussed in the Working Group. The Beacon API is intended to help scripts asynchronously transfer data to a Web server without blocking the unload event, which can negatively impact the perceived performance of the next navigation. The Resource Priorities API defines a means for Web developers to give the browser hints on the download priority of resources to help improve the page load time. As a corollary to the Timing specs, the Navigation Error Logging and Resource Error Loggingspecifications help developers understand the errors and availability of their applications. The Navigation Timing Level L2 specification adds High Resolution Time and Performance Timeline support to Navigation Timing, and High Resolution Time L2 specification adds Web Worker support. These are just some of the drafts the Working Group is currently defining, with more specification drafts on Prerender and other diagnostics areas forthcoming.

The W3C Web Performance Working Group is a great example of how quickly new ideas can become interoperable standards that developers can depend on in modern HTML5-enabled browsers. Together with industry and community leaders who participate in the Working Group, we hope to continue to make rapid progress on interoperable standards that will help developers make the Web faster.

Thanks,
Jatinder Mann
Internet Explorer Program Manager

W3C Web Performance: Continuing Performance Investments

I discuss the W3C Workshop on Performance in this IE Blog post article.

The W3C Web Performance working group recently held the W3C Workshop on Performance on Thursday, November 8, 2012. The goal was to hear current challenges and proposals for new performance ideas for the working group to consider. There were 45 attendees from 21 organizations, including most browser manufactures (Microsoft, Google, and Mozilla), hardware organizations (Intel, Qualcomm, Nokia, Motorola), network organizations (Cisco, Akamai, F5), and top Web properties (GMail, Google Search, Bing, NetFlix, LinkedIn, Zynga, and more). Details on the presentations and discussions from the workshop can be found in this report.

Providing the ability to accurately measure the performance characteristics of Web applications and create power- and CPU-efficient applications is critical to Web performance. The W3C Web Performance working group worked on achieving those goals in its recently completed second chartered period. In under two years, the working group rapidly standardized and modern HTML5-enabled Web browsers implemented these eight interfaces: Navigation Timing, Resource Timing, User Timing, Performance Timeline, Page Visibility, Timing control for script-based animations, High Resolution Timeand Efficient Script Yielding. Internet Explorer 10 is the first browser to support all eight of these new APIs.

The working group has since been focused on gathering data to understand which areas to focus on in its third chartered period. In addition to the Workshop on Performance, the working group has invited performance experts to its weekly conference calls and has broadly surveyed the performance community on ideas.

Based on all the data gathered these past few months, the Web Performance working group has decided to focus on the following areas in the third chartered period:

  • Timing Metrics The working group will continue to improve the Timing interfaces, Navigation Timing, Resource Timing, User Timing and Performance Timeline. For example, we will consider providing Web workers support in the Timing interfaces and including information on video byte ranges in Resource Timing.
  • Efficient Script Yielding The working group will continue to improve on power- and CPU-efficient APIs, like the setImmediate API defined in the Efficient Script Yielding specification.
  • Prerender The working group will standardized the prerender feature which allows navigations to appear almost “instantly” in cases where the browser has high confidence that a user will visit an URL. The way this feature would work is that the browser will proactively navigate to a Web page in a hidden tab, when it sees the “prerender” link type or has high confidence that user will visit that link. When the user does visit that link, the browser will make the hidden tab visible, giving the perception of instant navigation.
  • Resource Priorities Today, browsers download resources in the priority order that they believe are most efficient in helping the page load occur quickly. However, developers may want to prioritize some resources over others. For example, downloading images above the fold may be of higher priority than those below the fold. Though, developers can give some hints to the browser on download priority, like using the “defer” and “async” attributes in markup, these concepts do not include most resources. To help the browser prioritize downloading resources, the working group is expanding the charter to include interoperable means for developers to give the browser hints on the download priority of resources.
  • Diagnostics Interfaces Developers are interested in learning how to make their Web applications faster and less error prone. The working group is expanding the charter to include interoperable means for developers to get browser diagnostics information on their Web applications. For example, using these interfaces a developer could understand where memory is leaking or what errors users are encountering on their Web applications.
  • Beacon Today, analytics scripts will block the current page from unloading by running in a loop in order to confirm that analytics data has been sent to a Web server. This behavior will delay the navigation to the next page, resulting in user perception of poor performance. To help developers avoid that pattern, the working group is expanding the charter to include an interoperable means for developers to asynchronously transfer data from the browser to a Web server, with a guarantee from the browser that the data will eventually be sent.
  • Display Performance Developers are interested in understanding the performance of their games and animations.

    The working group is expanding the charter to include interoperable means for developers to get frame rate and throughput of the display type of information.

This working group is a great example of how quickly new ideas can become interoperable standards that developers can depend on in modern HTML5-enabled browsers. Together with industry and community leaders who participate in the working group, we hope to continue to make rapid progress on interoperable standards that will benefit developers and everyone who uses the Web.

Jatinder Mann, Internet Explorer, Program Manager