hckr.fyi // thoughts

Converting HTML to a Word Document in NodeJS and JavaScript

by Michael Szul on

Online-only isn't truly a thing, and sometimes when you're building a web application, you need to be able to offer your users the ability to download content in different formats. Most common is probably providing a list in Excel format. Also common is providing content in PDF format.

Sometimes, however, you get the semi-odd request, and you might be asked to let users download content in Microsoft Word format. What's more: They often want the content to resemble what was on the page--at least as far as text formatting goes. This means converting HTML to Word on the fly.

For this, we can use the HTML DOCX package.

Start by installing the package:

npm install html-docx-js --save
    

Next, include it in your code:

const htmlDocx= require('html-docx-js');
    

Now we just need some HTML to convert. Commonly, you might have rich text editors that allow you to capture rich content from users, and this content is stored in a database with HTML intact. It's just a matter of you pulling that content out of the database and passing it to the module's function.

For example, let's say you're pulling records from a database table, and you want to combine the content areas, and then push that out to the browser as a download.

Note: I'm using Express as the front-end JavaScript framework here.

app.get('/course/:name/:id'), async (req, res) => {
        const { name, id } = req.params;
        const pages = await getCoursePages(id);
        let html = '';
        for(let i = 0; i < pages.length; i++) {
                html += pages[i].page;
        }
        const docx = htmlDocx.asBlob(html);
        res.setHeader('Content-Type', 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet');
        res.setHeader('Content-Disposition', `attachment; filename=${ sluggify(name) }.docx`);
        res.setHeader('Content-Length', docx.length);
        res.send(docx);
    });
    

In the above code, you can see that we simply loop over the records from the getCoursePages() method, and concatenate the page property of the individual record. Once we have this concatenated string (which contains HTML), we pass it to the asBlob() method of the html-docx-js package. This creates the necessary Microsoft Word DOCX format, and once you have that, you can send it to the browser after setting the appropriate headers.