hckr.fyi // thoughts

How to Stream a Zip File to the Browser in Express and Node.JS

by Michael Szul on

As I've been working on the front-end of this learning management system (LMS) that my team and I have been building at work, I've had to build out a handful of functionality to deal with the ways that students interact with the courses--whether course content, course meta data, or course materials. In one particular use case, students can download all the materials for a particular course, for a particular week, for a particular day, or for an individual activity. Some of these materials are PowerPoint documents, some of Word documents, and some are even videos.

The nature of the downloaded materials, and the fact that students need to download multiple items, means that we want to zip those files up. Since the LMS is a web application, we want a download link that streams the contents back to the student on the fly.

I went with the archiver package:

npm install archiver --save
    

…and I abstracted the zipping code into a util.ts file, importing the package at the top:

import * as archiver from "archiver";
    

I'm also putting that import statement in the main index.ts file that I use as the entry point for the Express application and the routing. Let's go over the route first, and then come back to the utility function.

app.get("/example", async (req, res) => {
        const { id } = req.params;
        const materials = getMaterials(id);
        const archive = archiver("zip");
        res.attachment(`${activity}.zip`);
        archive.pipe(res);
        const archiveMaterials = await zipMaterials(archive, materials)
        archiveMaterials.finalize();
    });
    

This is a standard Express route, where we get the id from the request. The getMaterials() method is going to return an array of objects that have a folder property and a url property that represents the endpoint of an API where each material is located. We then create an archive object using the archiver() method, while specifying a "zip" parameter. For Express, since we want to stream it, we set an attachment to the response--attaching the zip file name. Since we want to stream this archive back to the browser, we pipe the archive to the Express response. Then we process the materials to make an archive using zipMaterials() before using finalize() on the archived materials to finish the zip.

What does the zipMaterials() method look like?

export async function zipMaterials(archive: archiver.Archiver, materials: any[]): Promise<archiver.Archiver> {
    
        archive.on("error", (e) => {
            throw new Error(e.message);
        });
    
        for(let i = 0; i < materials.length; i++) {
            const folder = materials[i].folder;
            const url = materials[i].url;
            const fileName = url.substr(url.lastIndexOf("/") + 1, url.length).split("?")[0];
            ...
            try {
                const opts = {
                    uri: `${url}`,
                    method: "GET",
                    encoding: null,
                };
                const res = await request(opts);
                archive.append(res, { name: `${folder}${fileName}` });
            }
            catch(e) {
                throw new Error(e.message);
            }
        }
        return archive;
    }
    

A couple of quick notes: I've trimmed some code out of this. Also, as I'm sure you've noticed by now, I'm using TypeScript, but other than the type declarations, all this code will work in JavaScript as well. The materials object is an array of objects that each have folder and url properties. In this example, we're actually pulling the files from externally hosted systems, so we're downloading them first, and then adding them to an archive. If this were a production application, you'd probably be pulling from a blob storage or a local filesystem, but this is a good example because it shows how we're using request to download a file, and then passing those results immediately to the archive. The encoding being set to null tells request() that this is binary data.

In the actual application I pulled this example from, the folder is really a course session name, and there are several materials for each session, so we group them by folders named after the session.

As far as the zip archive is concerned, we set the error handler (just bubbling up the error), loop the materials array, download the files, and then use the append() method to append the materials to the zip. Notice how I add the folder as a part of the file name in the name object property passed to append(). This will actually add the file as the fileName into a folder in the archive named for the folder value, getting that grouping that we want.

Once our loop is done, we return the archive, which takes us back to the Express route. As you can remember, that "finalizes" the returned object, which is being piped back to the end-user. On the end-user's side, the normal download process starts.