What follows is a paint-by-numbers approach to learning to code. We will call it cut-and-paste coding. I’m going to help you “build” a simple document automation system. All you have to do is follow the recipe. Once it is working, I will have you break it into pieces and put it back together. When you’re done, you won’t be a Lovelace, Hopper, or Wozniack, but you’ll know more than you do now, and you will be one step closer to your goal of learning to code, whatever that means to you.
(Remember, your fellow readers have varying motivations. Some of them are interested in coding the next big thing, some see this as a supplement to their weekly crossword, and others are just trying to figure out what all the fuss is about. Consequently, this post is trying to find a sweet spot that satisfies the largest number of people. Consequently, I’m sure to fail someone. In fact, I’ll probably fail everyone a little bit.)
At its heart, document automation is a computer running find and replace over and over again. The system identifies placeholders in some template document and then it replaces them with words drawn from some data source. A good system will make these replacements based on if-then statements of varying complexity. We, however, are not interested in making a good system. We want a simple system, a straightforward and accessible system, a toy to play with. (If you make it to the extra credit assignment, however, I will point you in the direction of making a good system.)
That being said, this post is a bit of a bear. It’s not difficult so much as tedious. Computers are also really picky. They’re even worse than that journal editor you had back in law school. One letter out of place and things stop working. Consider yourself warned.
If you aren’t interested in coding along at home, you can still get something out of this. However, I suggest you read only through Final Thoughts. After that, it’s overkill. If you’re all for the coding, I recommend reading this in two sittings, stopping at the end of Final Thoughts and giving yourself a break before starting the homework. So if the length is freaking you out, don’t worry. There’s an intermission. Also, you can always hit me up for help in the comments or on Twitter @Colarusso.
Here’s what we want our system to do: Take input from an online form (e.g., names) and pre-populate a document with this information (e.g., a cover sheet for your TPS report).
I’ve chosen this as our first exercise for a number of reasons, among them: (1) it’s a stepping stone to the wicked cool project I have planned for next time; (2) it’s relevant to most lawyers’ practice; (3) it’s something people have repeatedly asked me to help them build (mostly in the form of online intake); and (4) most of it can be done in Word.
Wait, what? Word? Microsoft Word?
When we consider technical solutions to problems, it’s important to consider the tools you’ve already paid for. Mail Merge—that feature in Word that creates form letters—is at its core serial find and replace and so a form of document automation.
(The fact that Word can do document automation may cause you to question the value of expensive document automaton services. The truth is you’re not paying them for technical solutions when you buy one of these solutions. You’re paying for people to help turn your documents into templates or methods to help you do this yourself. Don’t get me wrong, these are valuable offerings, but I think it’s important that you know what you’re paying for.)
This makes our task a good deal easier. Since we know Word can take in a set of values and replace a bunch of placeholders, all we need to do is figure out how to get those values from an online form into a format Word can read. (I’m pretending Office 365 doesn’t exist.)
Google Forms is a free service from Google that allows you to make simple online forms. It takes users’ submissions and dumps them into a Google Sheet, Google’s online version of a spread sheet. Word can do a mail merge on an Excel spreadsheet, so there must be a way to make it work with Google. Of course, you could manually download the Form’s Google Sheet as either a Microsoft Excel (
.xlsx) or a
.csv file. Both formats are easy for Word to work with once on the same machine as your copy of Word.
Such a setup would, however, require you to login to Google and download the file after every online form submission. You could save some time by making your spreadsheet public and sharing it as an
.csv file, what Google calls Publish to the web…, and then you could download the file directly without the need for a login. However, this would mean you couldn’t include confidential information as the content would be visible to anyone with the Sheet’s URL.
Ideally, we should be able to securely and automatically export the Sheet’s contents without a person having to click a bunch of menu items. If only there were a way for programs to talk to each other directly. Well, as you may have guessed, there is, and unsurprisingly, it has an acronym: API.
What is an API?
An application programming interface (API) is an interface that lets programs talk to each other. Instead of a user interface where a person is expected to click menu items to get certain results, an API lays out ways in which a computer program can get those same results by specially formatting its requests. Luckily, Google Sheets has an API we can use to download the contents of our Sheet.
Imagine how great it would be if everything had an API. Seriously, imagine if the Court’s docket had an API! Someone could build a program to deliver automatic notifications of changes to the docket, electronic filings would be a given, you could easily sync your calendar with the Court’s, and that’s just the tip of the iceberg. (I work at CPCS, the Massachusetts public defenders, and although this post, and my Lawyerist writings in general, does not reflect the views of my employer, I can say that they think this is an awesome idea, assuming someone works out the details. I can say this because they’re on the record suggesting just that.)
Setting the Stage
I’ve prepared some files so we can hit the ground running, and you’ll need a place to put them. So…
- Create a project folder somewhere on your hard drive. Something like
\Users\[your user name]\Documents\[cool project name]\. Note: When I want to type something that is clearly a place holder, I will place it in brackets to drive the point home. For example, if I were swearing you in, I would say, “Repeat after me. I [state your name] …” Hence, the presence of [your user name] and [cool project name] above.
- Download this .json file and this .docx file into your project folder. (The TPS Report cover sheet was adapted from this fan creation.) Customarily, this is done by right clicking on the link and choosing Save Link As or similar. For mice without a right button, one can often replicate a right click by holding down the control key while clicking. Where the files are saved will depend on your browser’s settings. So you may have to move them to your project folder after saving them.
.json file contains a cryptographic key (like a password) that tells Google that you are who you say you are, in this case, someone with authorization to access a given Google Sheet. The
.docx file is a template for a TPS report coversheet, complete with variables ready and waiting for the all important find and replace of document automation.
For the moment, you are using my key and template. This will allow you (and everyone else reading this) to access my sample Google Sheet via the API. This is for convenience. However, for homework, I will be asking you to get and make your own. Normally, sharing your key with the whole world would defeat the purpose of having a key, but I wanted you to be able to hit the ground running, and as long as this post doesn’t hit the front page of Reddit, we shouldn’t run into problems. (Google limits the number of times the key can be used per 100 seconds. So if a lot of people follow along with this project, it could break, but there would have to be a lot of people, and I mean a lot of people, using it at the same time.)
Standing on the Shoulders of Giants (Open Source Libraries)
There is no need to reinvent the wheel, and thanks to open source computer program libraries, there is no need to do so. I could write a whole post on the values of open source, but the creation of industry-standard no-cost tools is one of the big ones. We need to write a program that talks to Google’s API, but we can’t be the first people that have needed to do that.
Programmers try to avoid duplicate work, something known as DRY (don’t repeat yourself). Consequently, they often write little programs within programs to handle common tasks. Thanks to the open source mindset, these little programs are often bundled into collections called libraries. If you have a need for something and you’re not the first, chances are someone’s written a library.
In this case, we need Python (our chosen language, which you installed in part one of this series) to talk to Google’s API, and it turns out there are actually a few libraries for that. (APIs and the open source ethos were actually at the center of the recent Google v. Oracle case, and as you see their power, you’re probably getting an idea as to why this was such an important case.) We’ll be using gspread. Libraries often have dependencies—other libraries they need to work. So we’re actually going to install a few libraries. Luckily, the process is pretty straightforward.
- Open your terminal (Mac) or command prompt (Windows). If you don’t know how to open terminal/command prompt, here are some instructions for Mac and some for Windows.
pip install gspreadinto the command line (the generic name for the place where you can type in terminal/command prompt) and hit the enter/return key. It may hang for a moment, but this should eventually download the gspread library. Likely, you’ll see a bunch of text fly by, and you may be prompted to download additional libraries. Go ahead and say “yes” if asked to do so.
- Once you get your blinking cursor back, type
pip install --upgrade oauth2clientinto the command line and hit the enter/return key. This should update a library as you likely already have oauth2client installed. If, however, you get an error message saying that it could not find a version that satisfies the requirements or some such, try removing
pip install oauth2clientinstead.
- Once you get your blinking cursor back, type
pip install PyOpenSSLand hit the enter/return key. Again, this should download a library and you may be asked to say “yes” a few times.
- Once you get your blinking cursor back, type
jupyter notebookand hit the enter/return key. This will open Project Jupyter in your default web browser.
We’ll be using Python to download content from this Form. You can use it to submit test content, but be forewarned, I plan to turn off submission either: in about a month or if it becomes clear that people can’t be trusted to keep it “clean”.
- Inside Jupyter (the browser window that just opened), navigate to the project folder you created above and create a new Notebook just like you did in the last project.
- Rename the project [your cool project] by clicking on “Untitled” and interacting with the screen that results.
- Cut and paste the code below into the empty cell. Here’s the gist of what it does: import some libraries (sometimes called modules); connect to Google; prove you are an authorized user; download the contents of our sheet; and write these contents to a
.csvfile.https://gist.github.com/colarusso/b0808e0aa680067f120bdf7613d492ee#file-download_sheet-pyIt should look something like this:
- If you’re using Windows: replace “output.csv” with “[your projects’s location]/output.csv”. for example “C:/Users/dcolarusso/my cool project/output.csv”, and be sure to use slashes, not backslashes.
- Hit the run code button, the one that looks like the next track button on a CD player (right arrow with bar).
- This will run your code, and directly below your cell, you’ll see your output, and if everything works as planned, you should see something like this:
- You should also see a new file in your project directory called
output.csv.This file should contain the same information as that above (plus any new submissions).
- Congratulations! You have written your second Python program. (Assuming that your first was “Hello, World!” from last time and that you didn’t play around any between then and now.) Not only that, you have made your first call to an API. Sure, there were a lot of steps, but they weren’t hard, just tedious, and that’s the point. You can see far when standing on the shoulders of giants. The rapid pace of technological innovation is predicated on the accumulation and sharing of knowledge. It’s not magic, and you don’t have to start from scratch.
- Save your Notebook by clicking the Save button, the one that looks like a disk. This will allow you to take a break and come back to your work later.
- To automate the running of our program, we’ll need to save it as a Python script. To do this, click File > Download as > Python (.py) and save the file to your project folder. We’ll come back to this in a while.
TPS_Report_Cover_Sheet.docx(one of the files you downloaded above). We’re going to attach your
.csvas the data source for use by mail merge. You’ll notice that our cover sheet has placeholders in several places and that they have a special format (e.g. «Project_Name»). These placeholders are what we’ll be replacing with data from our
- Your version of Word may have a slightly different layout. So if the steps here don’t seem to match what you’re seeing, go ahead and Google something like “how to mail merge in Word [your version of Word] [your operating system].” You should be able to find a tutorial. Who knows, maybe even a video tutorial. Also, this might help.Source: “Tech Support Cheat Sheet” from xkcd.
- Navigate to the Mailings tab.
- Click on Select Recipients > Use an Existing List….
- When prompted, find and select your newly created
- Based on your version of Word, you’ll see some type of intermediate screen asking you about the file’s formatting. Most of the time, Word is clever enough to figure things out, but sometimes you need to answer a few questions. Important things you may be asked about: (1) the first row contains column names; and (2) the data is delimited (separated) by commas. When in doubt, go with the defaults.
- If you’ve successfully added your data source, the Filter Recipients and Insert Merge Fields buttons should be un-grayed out.
- Additionally, you should see your heading names in a list if you click on the Insert Merge Field button.
- Save your file. This should make it so you won’t have to add your data source the next time around. So that’s one thing out of the way.
- What follows are the steps you will use in the future when you want to create a new document from your template (i.e., TPS_Report_Cover_Sheet.docx). Here I’m going to recommend different steps depending on your version of Word.
- If you’re using Windows: Click on Edit Recipient List.
This will open a dialogue window that looks a little like a spread sheet. Check the box next to the recipient(s) you’d like to see in your final document. If you have a large list, you can even sort and filter your recipients to help you find the ones you want. The key is that you click the checkbox next to the recipient for which you are creating the custom document then hit OK.
- If you’re using a Mac: Unfortunately, the latest version of Word for Mac doesn’t present the recipient list in the same manner as Windows. I say unfortunately, because it doesn’t give you a nice tabular list like the Windows version. So I find it easiest to turn on Preview Results and and step through the entries until you find the one you want. First click on Preview Results.
You’ll notice that the placeholder text is replaced with data from your
.csvfile. For example, «Project_Name» became Penny Pincher.
Use the arrow keys to the right of Preview Results to step through the items in your data source. You’ll see the number change along with the contents of your template. Here, I’ve settled on Samir’s entry, entry number 3.
Select Current Record from the pull-down adjacent to the Finish & Merge button.
- Both Windows and Mac: Click the Finish & Merge button.
Choose Edit Individual Documents… from the dropdown menu.
If you’re using Windows, you’ll be asked which records (entries) you’d like to use. Choose Current Record and click OK.
- Word should open a new window with a name along the lines of “Documents1” or “Form Letters1.” This is your final output. Save it. Print it. Have fun!
- To make another document, all you have to do is open your template file, choose your recipient(s), and run the mail merge.
To actually save us time, our Python code needs to run automatically. Enter scripts. Above we saved our code to
[my cool project].py, something we call a Python script. Scripts like this can be automated to run at set intervals by using a task scheduler, and this is how we’ll assure that the
.csv on your computer is in sync with the Form’s submissions. Here again, your operating system will dictate next steps. However, in both cases you will need to use the command line. So open up terminal or the command prompt as appropriate.
- If you’re using Windows: In addition to the command prompt, you’ll need a new Notebook. So, open one of those too. In the Notebook’s empty cell, paste the following:
Run your code, and take note of the output. This is the path, or location, or your Python executable file, and we’ll need that info to schedule your script. For the record, it should look something like this “C:\Users\[your user name]\Anaconda3\python.exe” though it may be different, which is why I had you find it. The Notebook has done what we need of it. So lets move to the command prompt. Type and run the following one-line command:
schtasks /Create /SC MINUTE /MO 5 /TN PythonSheets /TR “‘[path to your executable]’ ‘[path to your project folder]\[your cool project].py'”
Remember, the text in brackets are placeholders that you need to fill in. For example, my command would like this:
schtasks /Create /SC MINUTE /MO 5 /TN PythonSheets /TR “‘C:\Users\dcolarusso\Anaconda3\python.exe’ ‘C:\Users\dcolarusso\project\project.py'”
FYI, you may be asked for your password. If everything worked, you should receive a message saying that your task was successfully scheduled, and every five minutes a new
output_mine.csvfile should be written to your hard drive.
The task scheduler we’re using is called schtasks, and you can find its documentation here. Basically, we’ve asked that your script be run every five minutes. To change this frequency, you can swap the 5 out for another number. Additionally, as it stands, this task will open a command prompt every five minutes to run the script. This can get annoying. So you can assign the task to a different user by adding their info with the
/RUtag. For example, you could try this:
schtasks /Create /SC MINUTE /MO 5 /TN PythonSheets /TR “‘[path to your executable]’ ‘[path to your project folder]\[your cool project].py'” /RU “SYSTEM”
You’ll be asked if you want to replace the existing PythonSheets task. Type, “y” and hit enter, and you should be good to go.
If you want to delete this task just run:
schtasks /Delete /TN PythonSheets
- If you’re using a Mac: Type and run the following command into the terminal.
This will open your crontab file, which is probably empty. Hit the i key to inset text, specifically, the following:
*/5 * * * * python “[path to project folder]/[your cool project].py” >/dev/null
Remember, the text in brackets are placeholders that you need to fill in. For example, my file would like this:
*/5 * * * * python “/Users/davidcolarusso/Documents/My Cool Project/project.py” >/dev/null
Hit the esc key to escape insert mode, and then while holding down the shift key hit z twice. This will save you edits. Now enter and run the following five commands one after the other.
echo PATH=$PATH > tmp.cron
echo >> tmp.cron
crontab -l >> tmp.cron
The task scheduler we used is called crontab, and you can find a deeper discussion of how it’s used here, including how to change the frequency of calls to your script. (Those last four commands were done to make sure Python could see our libraries. See http://stackoverflow.com/a/11148907) If you need to turn off these calls, repeat the insert text steps above, but delete the line referencing your script.
- Windows or Mac: It’s worth noting that you could go a step further and fully automate the mail merge by making a new document every time a row was added to your
.csvfile, but that’s a bit beyond the scope of this post. It should, however, be clear how this is just the natural extension of the above.
Remember, this was paint-by-numbers. Hopefully, you have a pretty picture and a sense of accomplishment. In a single sitting you were able to build a very simple document automation system.
Yes, functionally all you did was automate the saving of a file from Sheets to your hard drive, but that’s not really what this was about. Sure, it’s a Frankenstein’s monster of existing tools stitched together by a few lines of Python that for you are nothing more than an unintelligible incantation, but IT’S ALIVE! It gives you a better feel for what animates it than you would if I had tried to explain the concepts of open source libraries and APIs in isolation. Next time, we’ll build on what we’ve done here to construct Twitter bots. That should be fun.
What’s that you say? You didn’t learn anything?
One of the hardest parts about learning a new subject is knowing where to start. I used to run a science themed video explainer series. People would email me science questions, and most of the time I found the thing standing between people and the answer to their questions was nomenclature. They didn’t know what magic words to type in Google. The majority of my replies to them were links to Wikipedia—links they could have found if they had known something’s name. Here, you learned some new names, and they are the names you need to keep moving forward. (Depending on your background, this may have been the first time you’ve worked with: the command line, Python, Jupyter Notebooks, API, open source libraries, scripts, task schedulers, Google Forms, Google Sheets, and perhaps even Word’s mail merge.) Next time, after we build our bots, I’ll share a list of resources to help channel your interests whatever your inclination: Disruptor, Pragmatist, or Liberal Arts Major.
Homework: Build Your Own Online Form and Tie it to Word
Before starting your homework, I suggest letting things sit for a while. What follows is a tad tedious, and you’re more likely to finish if you’re working on a project you find compelling. (The dirty little secret behind most online learning, even the great MOOCS, is this: many start, few finish. Don’t become a statistic!) Think of a document or documents you really want to automate. Once you have something that excites you, that’s when you should continue. (If you need a little extra push, peak at the first link under Extra Credit.) Make sure you have all of your inputs listed and a good idea of what your final document will look like.
Your Own Key
Above, you were able to access the Google API by using my key, but this isn’t sustainable since I’ve literally made the key available to anyone with an internet connection. After all, the reason to use the key is to protect your data. So you’ll need to get your own and keep it secret. Here’s how.
- If you don’t already have one, create a Google Account.
- Sign into your Google Account and visit the Google Developer’s Console. You should see a screen like that below. Note: I’ll be using screen shots I took when making the key I shared above.
- Click on Credentials in the menu on the left.
- Click Create a project, and complete the popup window. I named my project “SheetsBot.”
- You’ll probably have to wait for a while as the project is created. After that, you should see a prompt asking you to create credentials. Click on Create credentials and choose Service account key. If you get bounced back to the home screen, click on Credentials again to see these options.
- You should see a screen like this.
- Choose New service account from the Service account dropdown.
- Give your account a name under Service account name. I called mine read-my-sheet.
- Copy down the Service account ID (i.e., firstname.lastname@example.org). You’ll need this later on.
- Select a role for your key. I choose Project > Viewer because I was sharing it with the world. You could give yours more authority if you like.
- Make sure that JSON is selected as your Key typeand click Create.
- This should result in your browser prompting you to save a
.jsonfile. Save this file to your project folder. After this, you should see the following notice. Of course your file name will be different.
Your Own Form
Above we made use of a simple form I put together, but I’m sure that you’re going to want to collect a different set of data. Which means creating your own form. Here’s how.
- Now we’re going to create your own Google Form. Somewhat counterintuitively, we’ll start by making a Google Sheet. Go to the Google Sheets page.
- Create a new sheet by hitting the box with the big plus symbol.
- Name your spreadsheet by clicking on “Untitled spreadsheet.”
- Click Tools > Create a form.
- Open a new browser tab and visit your sheet. Copy the sheet’s ID (i.e., the long string of text between slashes and right before “/edit”) to your clipboard. You’ll need it soon.
- This should open a new tab with a blank Form. You should be prompted to take a tour. I recommend you take the tour. If you don’t see this prompt, you can click the question mark at the bottom right of the page and choose Take a tour. After the tour, go ahead and create your form. After you’ve created your form, click the Send button.
- This will open a window with several sharing options. Choose the link option and copy the URL. This URL is where users will access the form.
- Fill in some sample values and click SUBMIT as a test.
- Go back to the tab with your Sheet. You should see your test submission.
- If you look at the bottom of the window, these submissions won’t be in Sheet1. Rather, they will be in Form Responses 1 or some such. Take note if the name is different, and click over to the new sheet if you’re still in Sheet1 and don’t see the results.
- Click the Share button in the upper right-hand corner of the window.
- Add the Service account ID from the steps above and click OK. You can set different permission.
- I unchecked Notify people because my API access isn’t a person. This, however, prompted Google to double check. I clicked OK.
- Note: Word can be very particular about the names of mail merge fields. Most of the time it will automatically convert column names into an acceptable format (e.g., replacing spaces with under scores), but sometimes not so much. These names are taken from the names of your questions in Forms. So avoid using special characters and the like.
Your Own Code
With the creation of your own Sheet and the fact that this sheet can only be accessed with your key, it would seem that we need to update your code (the place where we really do our work). I’m assuming that I don’t have to spell out the step-by-steps for things we’ve done before. However, you will have to you still have to tweak your code. Here’s how.
- In Jupyter, open the Notebook you saved earlier.
- Edit the code to reflect your new key and Sheet by doing the following:
- Replace the old ID (i.e.,
1FAIpQLSepHbg9slmVc4elsLhtlrbGtEKlYq-XBulNm1aKFAjBLNVULw) with your new form’s ID (remember you got this from the form’s URL).
- Replace the old.json file (i.e.,
SheetsBot-32718db5596a.json) with the name of your new
output.csvwith a new name like
output_mine.csv. However, if you’re using Windows, include the path to your project. For example “C:/Users/dcolarusso/my cool project/output_mine.csv”, and be sure to use slashes, not backslashes.
- If your results were in a sheet with a name other than
Form Responses 1, replace
Form Responses 1with the proper name.
For reference, here’s the unedited code.
Run your updated code as you have before. You should see the data from your test submission, and a new
.csvfile should be created in your project folder.
- Replace the old ID (i.e.,
- Save your Notebook as a Python script as you did above.
- Schedule the script to run at regular intervals as you did above.
Your Own Template
Above, we made use of a barebones cover sheet, but I’m sure you’d like to automate something more complex. So you’ll need to make your own template. Here’s how.
- Open a blank Word file and using the same steps as above, add your new
.csvfile as the data source. This should un-gray the Insert Merge Field button under the Mailings tab, allowing you to place variables into your document.
- Again, your version of Word may differ in layout. So Google as needed. Generally speaking, you should create the template document you want, place the cursor where variables should go, and select the appropriate variables from the Insert Merge Field dropdown. This will cause the variable’s placeholder to appear (e.g., «Project_Name») in the document ready for that good old find and replace.
- Save your template file, and do mail merges on your data as you like.
Read this page on Making your mail merge “intelligent” by using IF fields, and put it to use creating contingent mail merge behavior. Note: if you have a Mac, you’ll need to hit + F9, not control + F9 to create field braces. If you’d like something a little more visual, however, you can play around with the rules feature by clicking on the Rules button.
That’s right, Word allows you to make “replacements based on if-then statements of varying complexity.”
See? Your system doesn’t have to stay simple.