Reporting With R Markdown Datacamp Answers



When working on data science problems, you might want to set up an interactive environment to work and share your code for a project with others. You can easily set this up with a notebook.

Need a flexible way to learn data science online? Check out DataCamp's courses in R and data visulaizeation. Read course info & alumni reviews on Course Report!

DataCamp offers interactive R, Python, Sheets, SQL and shell courses. All on topics in data science, statistics and machine learning. Learn from a team of expert teachers in the comfort of your browser with video lessons and fun coding challenges and projects. R bookdown knitr r-markdown econometrics textbook linear-models r-programming introduction-to-r d3-visualization datacamp-exercises companion-book applied-economics Updated Nov 2, 2020.

In other cases, you’ll just want to communicate about the workflow and the results that you have gathered for the analysis of your data science problem. For a transparent and reproducible report, a notebook can also come in handy.

That's right; notebooks are perfect for situations where you want to combine plain text with rich text elements such as graphics, calculations, etc.

R And The Jupyter Notebook

Knight rider kitt voice mp3 download. Contrary to what you might think, Jupyter doesn’t limit you to working solely with Python: the notebook application is language agnostic, which means that you can also work with other languages.

There are two general ways to get started on using R with Jupyter: by using a kernel or by setting up an R environment that has all the essential tools to get started on doing data science.

Running R in Jupyter With The R Kernel

As described above, the first way to run R is by using a kernel. If you want to have a complete list of all the available kernels in Jupyter, go here.

To work with R, you’ll need to load the IRKernel and activate it to get started on working with R in the notebook environment.

First, you'll need to install some packages. Make sure that you don't do this in your RStudio console, but in a regular R terminal, otherwise you'll get an error like this:

This command will prompt you to type in a number to select a CRAN mirror to install the necessary packages. Enter a number and the installation will continue.

Then, you still need to make the R kernel visible for Jupyter:

Now open up the notebook application with jupyter notebook. You'll see R appearing in the list of kernels when you create a new notebook.

Using An R Essentials Environment In Jupyter

The second option to quickly work with R is to install the R essentials in your current environment:

These 'essentials' include the packages dplyr, shiny, ggplot2, tidyr, caret, and nnet. If you don't want to install the essentials in your current environment, you can use the following command to create a new environment just for the R essentials:

Now open up the notebook application to start working with R.

You might wonder what you need to do if you want to install additional packages to elaborate your data science project. After all, these packages might be enough to get you started, but you might need other tools.

Well, you can either build a Conda R package by running, for example:

Or you can install the package from inside of R via install.packages() or devtools::install_github (to install packages from GitHub). You just have to make sure to add the new package to the correct R library used by Jupyter:

If you want to know more about kernels or about running R in a Docker environment, check out this page.

Adding Some R Magic To Jupyter

A huge advantage of working with notebooks is that they provide you with an interactive environment. That interactivity comes mainly from the so-called 'magic commands'.

These commands allow you to switch from Python to command line instructions or to write code in another language such as R, Julia, Scala, …

To switch from Python to R, you first need to download the following package:

After that, you can get started with R, or you can easily switch from Python to R in your data analysis with the %R magic command.

Let's demonstrate how the R magic works with a small example:

If you want more details about Jupyter, on how to set up a notebook, where to download the application, how you can run the notebook application (via Docker, pip install or with the Anaconda distribution) or other details, check out our Definitive Guide.

The R Notebook

Up until recently, Jupyter seems to have been a popular solution for R users, next to notebooks such as Apache Zeppelin or Beaker.

Also, other alternatives to report results of data analyses, such as R Markdown, Knitr or Sweave, have been hugely popular in the R community.

However, this might change with the recent release of the R or R Markdown Notebook by RStudio.

You see it: the context of the R Markdown Notebook is complex, and it's worth looking into the history of reproducible research in R to understand what drove the creation and development of this notebook. Ultimately, you will also realize that this notebook is different from others.

R And The History of Reproducible Research

In his talk, J.J Allaire, confirms that the efforts in R itself for reproducible research, the efforts of Emacs to combine text code and input, the Pandoc, Markdown and knitr projects, and computational notebooks have been evolving in parallel and influencing each other for a lot of years. He confirms that all of these factors have eventually led to the creation and development of notebooks for R.

Firstly, computational notebooks have quite a history: since the late 80s, when Mathematica’s front end was released, there have been a lot of advancements. In 2001, Fernando Pérez started developing IPython, but only in 2011 the team released the 0.12 version of IPython was realized. The SageMath project began in 2004. After that, there have been many notebooks. The most notable ones for the data science community are the Beaker (2013), Jupyter (2014) and Apache Zeppelin (2015).

Then, there are also the markup languages and text editors that have influenced the creation of RStudio's notebook application, namely, Emacs, Markdown, and Pandoc. Org-mode was released in 2003. It’s an editing and organizing mode for notes, planning and authoring in the free software text editor Emacs. Six years later, Emacs org-R was there to provide support for R users. Markdown, on the other hand, was released in 2004 as a markup language that allows you to format your plain text in such a way that it can be converted to HTML or other formats. Fast forward another couple of years, and Pandoc was released. It's a writing tool and as a basis for publishing workflows.

Lastly, the efforts of the R community to make sure that research can be reproducible and transparent have also contributed to the rise of a notebook for R. 2002, Sweave was introduced in 2002 to allow the embedding of R code within LaTeX documents to generate PDF files. These pdf files combined the narrative and analysis, graphics, code, and the results of computations. Ten years later, knitr was developed to solve long-standing problems in Sweave and to combine features that were present in other add-on packages into one single package. It’s a transparent engine for dynamic report generation in R. Knitr allows any input languages and any output markup languages.

Also in 2012, R Markdown was created as a variant of Markdown that can embed R code chunks and that can be used with knitr to create reproducible web-based reports. The big advantage was and still is that it isn’t necessary anymore to use LaTex, which has a learning curve to learn and use. The syntax of R Markdown is very similar to the regular Markdown syntax but does have some tweaks to it, as you can include, for example, LaTex equations.

R Markdown Versus Computational Notebooks

R Markdown is probably one of the most popular options in the R community to report on data analyses. It's no surprise whatsoever that it is still a core component in the R Markdown Notebook.

And there are some things that R Markdown and notebooks share, such as the delivering of a reproducible workflow, the weaving of code, output, and text together in a single document, supporting interactive widgets and outputting to multiple formats. However, they differ in their emphases: R Markdown focuses on reproducible batch execution, plain text representation, version control, production output and offers the same editor and tools that you use for R scripts.

On the other hand, the traditional computational notebooks focus on outputting inline with code, caching the output across sessions, sharing code and outputting in a single file. Notebooks have an emphasis on an interactive execution model. They don’t use a plain text representation, but a structured data representation, such as JSON.

That all explains the purpose of RStudio's notebook application: it combines all the advantages of R Markdown with the good things that computational notebooks have to offer.

That's why R Markdown is a core component of the R Markdown Notebook: RStudio defines its notebook as 'an R Markdown document with chunks that can be executed independently and interactively, with output visible immediately beneath the input'.

How To Work With R Notebooks

If you’ve ever worked with Jupyter or any other computational notebook, you’ll see that the workflow is very similar. One thing that might seem very different is the fact that now you’re not working with code cells anymore by default: you’re rather working with a sort of text editor in which you indicate your code chunks with R Markdown.

How To Install And Use The R Markdown Notebook

The first requirement to use the notebook is that you have the newest version of RStudio available on your PC. Since notebooks are a new feature of RStudio, they are only available in version 1.0 or higher of RStudio. So, it’s important to check if you have a correct version installed.

If you don’t have version 1.0 or higher of RStudio, you can download the latest version here.

Then, to make a new notebook, you go to File tab, select'New File', and you'll see the option to create a new R Markdown Notebook. If RStudio prompts you to update some packages, just accept the offer and eventually a new file will appear.

Tip: double-check whether you’re working with a notebook by looking at the top of your document. The output should be html_notebook.

You’ll see that the default text that appears in the document is in R Markdown. R Markdown should feel pretty familiar to you, but if you’re not yet quite proficient, you can always check out our Reporting With R Markdown course or go through the material that is provided by RStudio.

Note that you can always use the gear icon to adjust the notebook's working space: you have the option to expand, collapse, and remove the output of your code, to change the preview options and to modify the output options.

This last option can come in handy if you want to change the syntax highlighting, apply another theme, adjust the default width and height of the figures appearing in your output, etc.

From there onwards, you can start inserting code chunks and text!

You can add code chunks in two ways: through the keyboard shortcut Ctrl + Alt + I or Cmd + Option + I, or with the insert button that you find in the toolbar.

What's great about working with these R Markdown notebooks is the fact that you can follow up on the execution of your code chunks, thanks to the little green bar that appears on the left when you're executing large code chunks or multiple code chunks at once. Also, note that there's a progress bar on the bottom.

You can see the green progress bar appearing in the gif below:

Talking about code execution: there are multiple ways in which you can execute your R code chunks.

You can run a code chunk or run the next chunk, run all code chunks below and above; but you can also choose to restart R and run all chunks or to restart and to clear the output.

Note that when you execute the notebook's code, you will also see the output appearing on your console! That might be a rather big difference for those who usually work with other computational notebooks such as Jupyter.

If there are any errors while the notebook's code chunks are being executed, the execution will stop, and there will appear a red bar alongside the code piece that produces the error.

You can suppress the halt of the execution by adding errors = TRUE in the chunk options, just like this:

Note that the error will still appear, but that the notebook's code execution won't be halted!

How To Use R Markdown Notebook’s Magic

Just like with Jupyter, you can also work interactively with your R Markdown notebooks. It works a bit differently from Jupyter, as there are no real magic commands; To work with other languages, you need to add separate Bash, Stan, Python, SQL or Rcpp chunks to the notebook.

These options might seem quite limited to you, but it's compensated in the ease with which you can easily add these types of code chunks with the toolbar's insert button.

Also working with these code chunks is easy: you can see an example of SQL chunks in this document, published by J.J Allaire. For Bash commands, you just type the command. There's no need extra characters such as ‘!’ to signal that you're working in Bash, like you would do when you would work with Jupyter.

How To Output Your R Markdown Notebooks

Before you render the final version of a notebook, you might want to preview what you have been doing. There's a handy feature that allows you to do this: you'll find it in your toolbar.

Click on the 'preview' button and the provisional version of your document will pop up on the right-hand side, in the 'Viewer' tab.

By adding some lines to the first section on top of the notebook, you can adjust your output options, like this:

To see where you can get those distributions, you can just try to knit, and the console output will give you the sites where you can download the necessary packages.

Note that this is just one of the many options that you have to export a notebook: there's also the possibility to render GitHub documents, word documents, beamer presentation, etc. These are the output options that you already had with regular R Markdown files. You can find more info here.

Tips And Tricks To Work With R Notebook

Besides the general coding practices that you should keep in mind, such as documenting your code and applying a consistent naming scheme, code grouping and name length, you can also use the following tips to make a notebook awesome for others to use and read:

  • Just like with computational notebooks, it might be handy to split large code chunks or code chunks that generate more than one output into multiple chunks. This way, you will improve the general user experience and increase the transparency of a notebook.
  • Make use of the keyboard shortcuts to speed up your work. You will find most of them in the toolbar, next to the commands that you want to perform.
  • Use the spellchecker in the toolbar to make sure your report's vocabulary is correct.
  • Take advantage of the option to hide your code if a notebook is code-heavy. You can do this through code chunk options or in the HTML file of the notebook itself!

The R Notebook Versus The Jupyter Notebook

Besides the differences between the Jupyter and R Markdown notebooks that you have already read above, there are some more things.

Let's compare Jupyter with the R Markdown Notebook!

There are four aspects that you will find interesting to consider: notebook sharing, code execution, version control, and project management.

Notebook Sharing

The source code for an R Markdown notebook is an .Rmd file. But when you save a notebook, an .nb.html file is created alongside it. This HTML file is an associated file that includes a copy of the R Markdown source code and the generated output.

That means that you need no special viewer to see the file, while you might need it to view notebooks that were made with the Jupyter application, which are simple JSON documents, or other computational notebooks that have structured format outputs. You can publish your R Markdown notebook on any web server, GitHub or as an email attachment.

There also are APIs to render and parse R Markdown notebooks: this gives other frontend tools the ability to create notebook authoring modes for R Markdown. Or the APIs can be used to create conversion utilities to and from different notebook formats.

Reporting With R Markdown Datacamp Answers

To share the notebooks you make in the Jupyter application, you can export the notebooks as slideshows, blogs, dashboards, etc. You can find more information in this tutorial. However, there are also the default options to generate Python scripts, HTML files, Markdown files, PDF files or reStructured Text files.

Code Execution

R Markdown Notebooks have options to run a code chunk or run the next chunk, run all code chunks below and above; In addition to these options, you can also choose to restart R and run all chunks or to restart and to clear the output.

These options are interesting when you’re working with R because the R Markdown Notebook allows all R code pieces to share the same environment. However, this can prove to be a huge disadvantage if you’re working with non-R code pieces, as these don’t share environments.

All in all, these code execution options add a considerable amount of flexibility for the users who have been struggling with the code execution options that Jupyter offers, even though if these are not too much different: in the Jupyter application, you have the option to run a single cell, to run cells and to run all cells. You can also choose to clear the current or all outputs. The code environment is shared between code cells.

Version control

There have been claims that Jupyter messes up the version control of notebooks or that it's hard to use git with these notebooks. Solutions to this issue are to export the notebook as a script or to set up a filter to fix parts of the metadata that shouldn't change when you commit or to strip the run count and output.

The R Markdown notebooks seem to make this issue a bit easier to handle, as they have associated HTML files that save the output of your code and the fact that the notebook files are essentially plain text files, version control will be much easier. You can choose to only put your .Rmd file on GitHub or your other versioning system, or you can also include the .nb.html file.

Project Management

As the R Markdown Notebook is native to the RStudio development kit, the notebooks will seamlessly integrate with your R projects. Also, these notebooks support other languages, including Python, C, and SQL.

Toshiba realtek audio driver windows 10. On the other hand, the Jupyter project is not native to any development kit: in that sense, it will cost some effort to integrate this notebook seamlessly with your projects. But this notebook still supports more languages and will be a more suitable companion for you if you’re looking for use Scala, Apache Toree, Julia, or another language.

Alternatives to Jupyter or R Markdown Notebooks

Apart from the notebooks that you can use as interactive data science environments which make it easy for you to share your code with colleagues, peers, and friends, there are also other alternatives to consider.

Because sometimes you don't need a notebook, but a dashboard, an interactive learning platform or a book, for example.

You have already read about options such as Sweave and Knitr in the second section. Some other options that are out there, are:

  • Even though this blog post has covered R Markdown to some extent, you should know that you can do so much more with it. For example, you can build dashboards with flexdashboard.
  • Or you can use Bookdown to quickly publish HTML, PDF, ePub, and Kindle books with R Markdown.
  • Shiny is a tool that you can also use to create dashboards. To get started with Shiny, go to this page.
  • In an educational setting, DataCamp Light might also come in handy to create interactive tutorials on your blog or website. If you want to see DataCamp light at work, go to this tutorial, for example.

Learning Objectives

  • What is RMarkdown?
  • What are the uses of RMarkdown
  • Creating html reports using knitr

For any experimental analysis, it is critical to keep detailed notes for the future reproduction of the experiment and for the interpretation of results. For laboratory work, lab notebooks allow us to organize our methods, results, and conclusions to allow for future retrieval and reproduction. Computational analysis requires the same diligence, but it is often easy to forget to completely document the analysis and/or interpret the results in a transparent fashion.

For analyses within R, RStudio helps facilitate reproducible research with the use of R scripts, which can be used to save all code used to perform a particular analysis. However, we often don’t save the version of the tools we use in a script, nor do we include or interpret the results of the analyses within the script.

Wouldn’t it be nice to be able to save/share the code with collaborators along with tables, figures, and text describing the interpretation in a single, cleaned up report file?

The knitr package, developed by Yihui Xie, is designed to generate reports within RStudio. It enables dynamic generation of multiple file formats from an RMarkdown file, including HTML and PDF documents. Knit report generation is now integrated into RStudio, and can be accessed using the GUI or console.

In this workshop we will become familiar with both knitr and the RMarkdown language. Before we delve into the details we will start with an activity to show you what an RMarkdown file looks like and the HTML report once you have used the knit() function.

Activity 1

  1. Create a new project in a new directory called rmd_workshop
  2. Download this RMarkdown file and save within the rmd_workshop project directory
  3. Download and uncompress this data folder within the project directory
  4. Open the .rmd file in RStudio
  5. knit the markdown

    Note: If you run into error when kniting the markdown, make sure your data structure is set properly as below:

    • The data folder is in the same directory as workshop-example.rmd file
    • Two files (counts.rpkm.csv and mouse_exp_design.csv) are located inside the data folder

RMarkdown basics

The Markdown language for formatting plain text format has been adopted by many different coding groups, and some have added their own “flavours”. RStudio implements something called “R-flavoured markdown” or “RMarkdown” which has really nice features for text and code formatting as described below.

As RMarkdown grows as an acceptable reproducible manuscript format, using knitr to generate a report summary is becoming common practice.

Text

The syntax to format the text portion of the report is relatively easy. You can easily get text that is bolded, italicized, bolded & italicized. You can create “headers” and “sub-headers” by placing an “#” or “##” and so on in front of a line of text, generate numbered and bulleted lists, add hyperlinks to words or phrases, and so on.

Let’s take a look at the syntax of how to do this in RMarkdown before we move on to formatting and adding code chunks:

You can also get more information about text formatting here and here.

Code chunks

The basic idea is that you can write your analysis workflow in plain text and intersperse chunks of R code delimited with a special marker (```). Backticks (`) commonly indicate code and are also used for formatting on GitHub.

Each individual code chunk should be given a unique name. knitr isn’t very picky how you name the code chunks, but we recommend using snake_case for the names whenever possible.

There is a handy Insert button within RStudio that allows for the insertion of an empty R chunk if desired.

Additionally, you can write inline R code enclosed by single backticks (`) containing a lowercase r (like ``` code chunks). This allows for variable returns outside of code chunks, and is extremely useful for making report text more dynamic. For example, you can print the current date inline within the report with this syntax: `r Sys.Date()` (no spaces).

As the final chunk in your analysis, it is recommended to run the sessionInfo() function. This function will output the R version and the versions of all libraries loaded in the R environment. The versions of the tools used is important information for reproduction of your analysis in the future.

Activity 2

  1. Add a new section header in the same size as the “Project details” header at the end
  2. Next, add a new code chunk below it to display the output of sessionInfo()
  3. Modify the Author and Title parameters at the top of the script
  4. knit the markdown

Options for code chunks

The knitr package provides a lot of customization options for code chunks, which are written in the form of tag=value.

There is a comprehensive list of all the options available, however when starting out this can be overwhelming. Here, we provide a short list of some options commonly use in chunks:

  • echo = TRUE: whether to include R source code in the final document. If echo = FALSE, R source code will not be written into the final document. But the code is still evaluated and its output will be included in the final document
  • eval = TRUE: whether to evaluate/execute the code
  • include = TRUE: whether to include R source code and its output in the final document. If include = FALSE, nothing (R source code and its output) will be written into the final document. But the code is still evaluated and plot files are generated if there are any plots in the chunk
  • warning = TRUE: whether to preserve warnings in the output like we run R code in a terminal (if FALSE, all warnings will be printed in the console instead of the output document)
  • message = TRUE: whether to preserve messages emitted by message() (similar to warning)
  • results = 'asis': output as-is, i.e., write raw results from R into the output document instead of LaTeX-formatted output. Another useful option for this option is “hide”, which will hide the results, or all normal R output

The setup chunk

The setup chunk is a special knitr chunk that should be placed at the start of the document. We recommend storing all library() loads required for the script and other load() requests for external files here. In our RMarkdown templates, such as the bcbioRnaseq differential expression template, we store all the user-defined parameters in the setup chunk that are required for successful knitting.

Global options

knitr allows for global options to be set on all chunks in an RMarkdown file. These are options that should be placed inside your setup chunk at the top of your RMarkdown document. These will be the default options used for all the code chunks in the document, however they can be modified for each code chunk.

An additional cool trick is that you can save opts_chunk$set settings in ~/.Rprofile and these knitr options will apply to all of your RMarkdown documents, and not just the one.

Activity 3

  1. Only some of the code chunks have names; go through and add names to the unnamed code chunks.
  2. For the code chunk named data-ordering do the following:
    • First, add a new line of code that displays a small part of the newly created data_ordered data frame using head()
    • Next, modify the options for ({r data-ordering}) such that the output from the new line of code shows up in the report, but not the code
  3. Without removing the last code chunk (for boxplot) from the Rmd file, modify its options such that neither the code nor its output appear in the report
  4. knit the markdown

Figures

A neat feature of knitr is how much simpler it makes generating figures. You can simply return a plot in a chunk, and knitr will automatically write the files to disk, in an organized subfolder. By specifying options in the setup chunk, you can have R automatically save your plots in multiple file formats at once, including PNG, PDF, and SVG. A single chunk can support multiple plots, and they will be arranged in squares below the chunk in RStudio.

There are also a few options commonly used for plots to easily resize the figures in the final report. You can specify the height and width of the figure when setting up the code chunk.

Tables

knitr includes a simple but powerful function for generating stylish tables in a knit report named kable(). Here’s an example using R’s built-in mtcars dataset:

mpgcyldisphpdratwtqsecvsamgearcarb
Mazda RX421.061601103.902.62016.460144
Mazda RX4 Wag21.061601103.902.87517.020144
Datsun 71022.84108933.852.32018.611141
Hornet 4 Drive21.462581103.083.21519.441031
Hornet Sportabout18.783601753.153.44017.020032
Valiant18.162251052.763.46020.221031

There are some other functions that allow for more powerful customization of tables, including pander::pander() and xtable::xtable(), but the simplicity and cross-platform reliability of knitr::kable() makes it an easy pick.

Reporting With R Markdown Datacamp Answers Sheet

Generating the report

Once we’ve finished creating an RMarkdown file containing code chunks, we finally need to knit the report. You can knit it by using the knit() function, or by just clicking on “knit” in the panel above the script as we had done in our first activity in this lesson.

When executing knit() on a document, by default this will generate an HTML report. If you would prefer a different document format, this can be specified in the YAML header with the output: parameter. You can also click on the button in the panel above the script and click on “Knit” to get the various options as shown below:

Note: PDF rendering is sometimes problematic, especially when running R remotely, like on the cluster (Odyssey or O2). If you run into problems, it’s likely an issue related to pandoc.

The RStudio cheatsheet for Rmarkdown is quite daunting, but includes more advanced Rmarkdown options that may be helpful as you become familiar with report generation, including options for adding interactive plots RShiny.

Activity 4

  1. Download the linked R script
  2. Download the linked RData object by right-clicking and save to data folder.
  3. Transform the R script into a new RMarkdown file with the following specifications:
    • Create an R chunk for all code underneath each # comment in the original R script
    • Comment on the plots (you may have to run the code from the R script to see the plots first)
    • Add a floating table of contents
  4. knit the markdown

Reporting With R Markdown Datacamp Answers Answer

Note1: output formats

RStudio supports a number of formats, each with their own customization options. 123movies the real housewives of atlanta season 10. Consult their website for more details.

The knit() command works great if you only need to generate a single document format. RMarkdown also supports a more advanced function named rmarkdown::render(), allows for output of multiple document formats. To accomplish this, we recommend saving a special file named _output.yaml in your project root. Here’s an example from our bcbioRnaseq package:

Note2: working directory behavior

knitr redefines the working directory of an RMarkdown file in a manner that can be confusing. If you’re working in RStudio with an RMarkdown file that is not at the same location as the current R working directory (getwd()), you can run into problems with broken file paths. Make sure that any paths to files specified in the RMarkdown document is relative to its location, and not your current working directory.

A simple way to make sure that the paths are not an issue is by creating an R project for the analysis, and saving all RMarkdown files at the top level and referring to the data and output files within the project directory. This will prevent unexpected problems related to this behavior.

This lesson has been developed by members of the teaching team and Michael J. Steinbaugh at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.