hckr.fyi // thoughts

Chatdown Glob - An Example of Processing Multiple Files Using Glob

by Michael Szul on

For those of you that have been reading this blog for a while now, clearly you know that chatbots have been my thing. I got involved early, and as a Microsoft MVP, I've been privy to newer things coming down the pipeline thanks to interactions with the product group. This allowed me to hit the ground running with a few of the CLI tools, including Chatdown--a tool for marking up conversations, and creating transcripts.

The problem with Chatdown is that it is only built for processing one file at a time. The README for the project offers suggestions on using a file watcher for automatic processing, and I've written in the past about using PowerShell and Visual Studio Code to process multiple files, but those solutions always seemed incomplete.

This week, I was invited to the Bot Builder Community team by Gary Pretty, James Mann, and Arafat Tehsin, who are all building out great extensions to the Bot Framework. They wanted to know if I could contribute from the Node.JS side, since a lot of my work and tutorials around bots center on the Node.JS SDK. I happily jumped on board, and in addition to merging my projects with the organization repos, I created a new one that I've been dying to mess with Chatdown Glob.

Chatdown itself can be used as a library inside of Node.JS applications. The Chatdown CLI is actually just a small bit of JavaScript that imports the Chatdown library to process text passed in through the command line arguments. This means, that it wouldn't take much to create a multi-file processor. The biggest difference is accepting file paths rather than just file names.

Processing text with the Chatdown library is pretty easy. It's one line of code.

let activities = await chatdown("CHATDOWN_TEXT_GOES_HERE");
    

To process a file, you just pass in the results from reading the file. Both Chatdown and Chatdown Glob use the read-text-file module.

let activities = await chatdown(txt.readSync("PATH_TO_FILE"));
    

The rest is just file parsing--Chatdown does all the hard work.

How do you parse based on directory structures? I didn't want to use directories themselves. I wanted to be able to process items recursively based on a pattern like you can with configuration information in package.json or tsconfig.json files. This is where the glob module came in handy. It works like this:

glob("**/*.chat", { "ignore": ["**/node_modules/**"] }, (err, files) => {
    ...
    });
    

For the Chatdown Glob, I needed to be careful about asynchronous calls since I had to loop over and process files, and I didn't want to prematurely write to the standard output, or accidentally end the process. As a result, I ended using the sync() function.

let files = glob.sync("**/*.chat", { "ignore": ["**/node_modules/**"] });
    

That's it. That enables you to get a specific collections of files. Then it's just a matter of looping over them, and sending each to the Chatdown library before saving them back out to the filesystem.

The full application looks something like this:

#!/usr/bin/env node
    
    const glob = require("glob");
    const minimist = require("minimist");
    const chatdown = require("chatdown");
    const chalk = require("chalk");
    const path = require("path");
    const fs = require("fs-extra");
    const txt = require("read-text-file");
    
    async function runProgram() {
        const args = minimist(process.argv.slice(2));
        args.inputDir = (args._.length > 0) ?  args._[0] : "**/*.chat";
        args.outputDir = (args._.length > 1) ?  args._[1] : "./";
        let files = glob.sync(args.inputDir, { "ignore": ["**/node_modules/**"] });
        for(let i = 0; i < files.length; i++) {
            try {
                let fileName = files[i];
                if(files[i].lastIndexOf("/") != -1) {
                    fileName = files[i].substr(files[i].lastIndexOf("/"))
                }
                fileName = fileName.split(".")[0];
                let activities = await chatdown(txt.readSync(files[i]));
                let outputDir = (args.outputDir.substr(0, 2) === "./") ? path.resolve(process.cwd(), args.outputDir.substr(2)) : args.outputDir;
                let writeFile = `${outputDir}/${fileName}.transcript`;
                await fs.ensureFile(writeFile);
                await fs.writeJson(writeFile, activities, { spaces: 2 });
            }
            catch(e) {
                process.stderr.write(`${chalk.red(e)}\n`);
            }
        }
    }
    
    runProgram()
        .then(() => {
            process.stdout.write(chalk.green("Successfully wrote transcript files.\n"));
            process.exit(0);
        })
        .catch(e => {
            process.stderr.write(`${chalk.red(e)}\n`);
        });
    

You can install it via npm install chatdown-glob -g.

I tried to maintain a similar style, code structure, and dependency tree as the Chatdown application.

This is, of course, a rough version of a glob application, and there is a ton of room for improvement, but it's a great start, and shows how working with globs can actually make your life a lot easier when searching directory information.