Collaborative Writing on GitHub

GitHub also offers a powerful platform for collaborative projects. As co-authors, we composed the text of these book chapters and all of the sample code templates on GitHub. Jack started each day by “pulling” the most recent version of the book from our shared GitHub account to his local computer using GitHub Desktop, where he worked on sections and “pushed” his commits (aka edits) back to GitHub. At the same time, Ilya “pulled” the latest version and “pushed” his commits back to GitHub as well. Both of us see the commits that each other made, line-by-line in green and red (showing additions and deletions), by selecting the GitHub repo Code tab and clicking on one of our commits. TODO: point in image in Ch11

Although GitHub does not operate like Google Documents, which displays live edits, the platform has several advantages when working collaboratively with code. First, since GitHub tracks every commit we make, it allows us to go back and restore a very specific past version of the code if needed. Second, when GitHub repos are public, anyone can view your code and submit an “issue” to notify the owner about an idea or problem, or send a “pull request” of suggested code edits, which the owner can accept or reject. Third, GitHub allows collaborators to create different “branches” of a repo in order to make edits, and then “merge” the branches back together if desired. Occasionally, if two or more coders attempt to push incompatible commits to the same repo, GitHub will warn about a “Merge Conflict,” and ask you to resolve these conflicts in order to preserve everyone’s work.

File Organization and Headers

We organized the GitHub repository for this book as a set of .Rmd files, one for each chapter. As co-authors, we are careful to work on different chapters of the book, and to regularly push our commits to the repo. Only one of us regularly builds the book with Bookdown to avoid code merge conflicts.

Bookdown assigns a default ID to each header, which can be used for cross-references. The default ID for # Topic is {#topic}, and the default ID for ## Section Name is {#section-name}, where spaces are replaced by dashes. But we do not rely on default IDs because they might change due to editing or contain duplicates across the book.

Instead, we manually assign a unique ID to each first- and second-level header in the following way. Note that the {-} symbol, used alone or in combination with a space and a unique ID, prevents auto-numbering in the second- thru fourth-level headers:

# Top-level chapter title {#unique-name}
## Second-level section title {- #unique-name}
### Third-level subhead {-}
#### Fourth-level subhead {-}

Also, we match the unique ID keyword to the file name for top-level chapters this way: 01-keyword.Rmd to keep our work organized. Unique names should contain only alphanumeric characters (a-z, A-Z, 0-9) or dashes (-).

Subheaders must have unique names or IDs to avoid Bookdown errors about duplicated references. To avoid this issue for repeated subheaders (such as “Summary”), at the end of each chapter insert a third-level summary subhead, but insert a unique ID that matches each chapter number, like this: ### Summary {- #summary17}

A special header in this book is the unnumbered header beginning with (APPENDIX), which indicates that all chapters appearing afterwards are appendices. According to Bookdown, the numbering style will appear correctly in HTML and LaTeX/PDF output, but not in Word or ebooks.

# Chapter One

# Chapter Two

# (APPENDIX) Appendix {-}

# Appendix A

# Appendix B

In the Bookdown index.Rmd for the HTML book output and the PDF output, the toc_depth: 2 setting displays chapter and section headers down to the second level in the Table of Contents.

The split_by: section setting divides the HTML pages at the second-level header, which creates shorter web pages with reduced scrolling for readers. For each web page, the unique ID becomes the file name, and is stored in the docs subfolder.

The number_sections setting is true for the HTML and PDF editions, and given the toc_depth: 2, this means that they will display two-level chapter-section numbering (1.1, 1.2, etc.) in the Table of Contents. Note that number_sections must be true to display Figure and Table numbers in x.x format, which is desired for this book. See relevant settings in this excerpt from index.Rmd:

output:
  bookdown::gitbook:
    ...
    toc_depth: 2
    split_by: section
    number_sections: true
    split_bib: true
    ...
bookdown::pdf_book:
  toc_depth: 2
  number_sections: true

Note that chapter and section numbering do not appear automatically in the MS Word output unless you supply a reference.docx file, as described below:

In the _bookdown.yml settings, all book outputs are built into the docs subfolder of our GitHub repo, as shown in this excerpt:

output_dir: "docs"
book_filename: "HandsOnDataViz"
language:
  label:
    fig: "Figure "
chapter_name: "Chapter "

In our GitHub repo, we set GitHub Pages to publish to the web using main/docs, which means that visitors can browse the source files at the root level, and view the HTML web pages hosted in the docs subfolder. We use the GitHub Pages custom domain setting so that the HTML edition is available at https://HandsOnDataViz.org.

The docs subfolder also may contain the following items, which are not generated by Bookdown and need to be manually created:

  • CNAME file for the custom domain, generated by GitHub Pages.
  • .nojekyll invisible empty file to ensure speedy processing of HTML files by GitHub Pages.
  • 404.html custom file to redirects any mistaken web addresses under the domain back to the index.html page.

One more option is to copy the Google Analytics code for the web book, paste it into an HTML file in the book repo, and include this reference in the index.Rmd code:

output:
  bookdown::gitbook:
  ...
  includes:
    in_header: google-analytics.html