Why Clinical Laboratorians Should Embrace the R Programming Language


Like many different industries, scientific laboratories have gotten extra reliant on knowledge analytics. Scientific laboratories generate, course of, and retailer transactional knowledge with top quality and effectivity. These knowledge are required for affected person care and high quality assurance actions and more and more are used for operational selections. To research all these knowledge, laboratories typically depend on business spreadsheets or different specialised software program purposes. Nonetheless, these applications will be functionally restricted and sometimes usually are not appropriate for extra advanced statistical analyses and visualizations or for evaluation of enormous or high-dimensional datasets. Importantly, the evaluation and visualization workflows in these applications have restricted reproducibility and transparency.

In distinction, R is a complete, open supply, platform-independent, freely out there programming language, and it has a large, worldwide consumer and contributor base. These traits make R ideally suited to scientific laboratorians. Functions of R in drugs—and particularly amongst scientific laboratorians—are rising resulting from elevated visibility of R’s versatility and the provision of related, targeted coaching.

Instruments for nearly any conceivable utility have been written in R and publicly shared, enabling adaptation to quite a lot of consumer pursuits. Additional, analyses carried out with R are extremely customizable, reproducible, and will be automated. R is closely utilized for its graphic and reporting capabilities, together with the flexibility to render publication-quality figures with interactivity and to generate web-based dashboards and different studies in quite a lot of codecs.

What Is R, and What Are Its Advantages?

As a statistical programming language, R permits laboratorians and others to remodel and analyze knowledge and talk outcomes. It consists of all kinds of capabilities that present better performance for working with knowledge than Microsoft Excel and different business knowledge evaluation applications. R makes use of text-based instructions to course of knowledge, and as such it features as a full-fledged programming language for the superior consumer.

Not like Excel and lots of different graphical consumer interface (GUI)-based applications, R’s reliance on text-based construction makes it easy to assessment at any time the instructions utilized in a knowledge processing pipeline to make sure that the proper steps have been taken. Moreover, the flexibility to view the underlying instructions facilitates transparency and reproducibility of analyses.

The identical textual content instructions are used whatever the dimension of the dataset; thus, it’s simply as simple for the consumer to carry out an evaluation on 1 million check outcomes as it’s to carry out that evaluation on 10 outcomes. This function makes it easy to automate and scale any course of with R. As well as, the graphing capabilities of R far surpass that of Excel and lots of different GUI-based applications, in each performance and potential for personalisation and automation.

Getting Began With R Programming

Although customers can program with R from the command-line interface of a pc, it’s common to make use of an built-in improvement atmosphere (IDE) like RStudio. RStudio gives a cross-platform (i.e., works the identical on Home windows and Mac) graphical interface to write down and execute code and to configure and handle parts of R environments, together with knowledge, plots, outcomes, variations, and packages. Much like different IDEs, RStudio consists of a number of options that make writing and debugging code simpler and extra environment friendly. And in contrast to another IDEs, it gives easy integration with instruments for interactive documentation and dynamic report era in quite a lot of codecs (e.g., .doc, .pdf, .html).

5 Key Attributes of R

Rows numbered 1 through 5 listing reasons to use R

Open Supply

As R software program is freely out there and open supply, scientific laboratorians can obtain it at no cost and deploy it broadly of their labs or hospital methods with none licensing charges. Open supply signifies that the underlying code for R will be downloaded and, in precept, edited if mandatory. That is essential as a result of it ensures that R doesn’t rely on a business entity for bug fixes and empowers a big inhabitants of builders who’re in a position to audit the underlying code, minimizing the prospect for safety points or different supply code errors and guaranteeing ongoing improvement of the software program. Open supply software program additionally permits any suitably expert particular person to look at precisely how the software program works, moderately than counting on it as a “black field.” Whereas the continued evolution and enhancement of the code base can create compatibility points over time, programmers provide packages and workflows designed to assist with this situation.

A Broad-Based mostly Group of Customers

R is broadly used and supported inside statistical communities and data-driven industries. Given the speedy tempo of improvement in analytical strategies, corresponding to machine studying or synthetic intelligence, it will be significant that software program packages frequently evolve. The 2 dominant selections for knowledge science as of this writing are R and Python (additionally freely out there), and in each circumstances it’s now attainable to adapt code developed on one platform to run on the opposite. Consequently, for nearly any statistical methodology, new or outdated, there’s possible at the least one freely out there add-on bundle to implement it throughout the R atmosphere. These packages can be found on common software-hosting websites, together with the Complete R Archive Community (CRAN; https://cran.r-project.org/) and GitHub.

As one example, Holmes and Buhr have recently published work related to extracting reference intervals from laboratory results (1). While they developed a corrected version of the traditional Hoffmann method that runs in R, they were also able to easily implement a statistically superior (and more algorithmically complex) approach using mixture modeling based on freely available code. The ability to use superior, more accurate statistical methods by taking advantage of the massive repository of available add-on packages is a significant advantage for laboratorians.

Importantly, the R community is recognized for its purposeful inclusivity, both in welcoming diversity among members and in fostering new members’ ability to learn the language.

Integrated Tools for Sharing Results

R provides a number of convenient tools for sharing and communicating results with dynamic reporting and the potential for interactivity. In particular, R supports the development of web-based dashboards and user interfaces and applications. There are several methods for creating graphics interfaces in R, including Shiny, a package for creating general web-based interfaces to R programs.

Among other things, these methods make it possible for a laboratorian to develop interactive business intelligence-style dashboards for operational management that would otherwise require commercial software, such as Tableau or QlikView. Custom R-based reports can be widely deployed for use by members of a laboratory who do not have any knowledge of R programming and do not have R installed.

With R, a laboratorian can also readily turn analyses into presentations or high-quality documents. Analysis and reporting can also be automated to occur on a user-defined schedule (i.e., every day at 8 a.m.).

A Perfect Fit for Clinical Laboratory Data

R is ideally suited for the type of data that clinical laboratories typically generate. Most lab datasets are structured in a rectangular format, meaning that variables are in different columns, and samples are in rows. For example, a typical laboratory information system data report might show a different patient result on each row, while columns would list test date and time, patient identification, test name, result, units, reference range, and other data. This format is routinely handled in R as a data frame, and many native tools have been provided in R for processing such data.

Even if raw data are not optimally formatted, R excels at transforming data from a variety of formats into rectangular data frames. In fact, some of the most prominent R packages developed over the past few years (the so-called tidyverse packages for dealing with tidy data) are optimized to import, structure, transform, summarize, model, plot, and communicate these types of datasets (2). As a result, laboratorians can more easily perform frequently-conducted laboratory data processing tasks, from generating turnaround time reports to looking at global distributions of results by assay. Moreover, they can share results in PDF reports, interactive dashboards, or other formats. R is also heavily utilized in high dimensional data analyses, common to ’omics, because of its comprehensive and cutting-edge package library of statistical methods. This includes packages specifically built for genomics (e.g., Bioconductor project) and metabolomics (e.g., XCMS, MetaboAnalyst).

Reliable Scalability

R can be scaled for use across the entire healthcare enterprise—from one person downloading it on a personal laptop or workstation to a group of laboratorians, clinicians, or analysts who want to collaborate on a large project. Similarly, if an institution wishes to implement a bioinformatic pipeline or make Shiny dashboards available organization-wide, commercially supported tools and services can be purchased to enable these workflows.

R integrates seamlessly with many other popular data science technologies (e.g., Python, SQL, Spark, TensorFlow, Microsoft PowerBI, GitHub, etc.). Thus, learning R provides a foundation for creating a wide variety of tools that can be scaled anywhere from an individual user to system-wide clinical deployment of a complete data science pipeline.

Examples of R Applications in Clinical Laboratories

Laboratorians have developed R packages to perform many of the routine tasks of assay validation without using commercial software. In addition, R is ideal for many of the calculations and data processing steps that are repeatedly performed in a clinical laboratory.

Suppose, for example, that administrators require that a laboratory report its annual test volumes year-by-year. Using conventional tools like Excel, this can be a time-consuming, error-prone task involving many copy-paste steps with questionable reproducibility and opaque decision-making. Navigating a very large file using Excel may itself be a problem. In contrast, R handles such files with ease, limited only by a computer’s total memory size.

In this example, there may be a number of specific inclusions/exclusions (e.g., include the central laboratory and a subset of satellite laboratories, but exclude certain other satellite labs), and these can typically be handled in a few lines of R code. Tests may also be counted in specific ways (e.g., use white blood cell count as a surrogate for the total number of complete blood counts, and ignore other complete blood count elements, or even more complicated permutations). Such calculations in Excel could require manual processing prone to errors.

In contrast, R not only efficiently processes complex rules, but also does so in a reproducible way. If an individual makes a copy/paste error in Excel, it may never be detected; however, R code can be reviewed at a later time for correctness, and it can be reapplied to a new dataset in the same format without starting from scratch. This makes R ideal for recurrent tasks such as calculating turnaround times, assessing quality control compliance, tracking population statistics, and other operationally relevant data. Once data have been analyzed, R can be used to communicate the results in various formats.

In addition to operational work, R supports clinical laboratories’ needs for more advanced analytics and statistical modeling. There is growing interest in applying artificial intelligence/machine learning approaches to laboratory data in order to predict disease. R packages provide access to every major approach in this area, from straightforward logistic regression to random forests and even deep learning.

We have used R, for example, to analyze hematology analyzer results and build a random forest model that flags samples from patients with myelodysplastic syndrome (3). This required no proprietary software—only the freely available data processing tools from R to load and process data, create training and test datasets, build a random forest model, and plot receiver operating characteristic curves showing the performance of this model on independent datasets. Given the importance of predictive analytics for the future of laboratory medicine, R provides an ideal tool for clinical laboratorians to learn about or experiment with these new analytic techniques.

Resources for New R Programmers

A table with a format column and an example column for learning R

The aforementioned user and contributor base has embraced the open source movement. This user community generously creates and shares resources for learning R in a variety of formats (Table 1). Though the available content is largely not specific to laboratory medicine, learners can quickly and easily find educational materials—many free—for most any application of R through an Internet search or exploration of a book.

Translating general R principles for data manipulation, analysis, and visualization to laboratory-related problems is usually straightforward. Though no prior programming experience is needed to learn R, those new to programming might find it challenging at first. That said, R is one of the fastest growing programming languages and is experiencing a surge of interest within pathology and laboratory medicine.

Learning R requires working with data and writing and executing code. The first steps in learning R involve gaining access to R and RStudio. There are several ways to accomplish this, including downloading and installing R and RStudio or initiating a free RStudio Cloud account (rstudio.cloud). R includes many built-in datasets that are commonly used for demonstrations of package functionality and in tutorials.

Self-paced education is available through several massive open online course formats and from websites focused on R education, examples of which are listed in Table 1. These resources encompass, for example, comprehensive curricula that teach the basics for using R to wrangle, analyze, and visualize data; modules with targeted instruction for performing a specific analysis in R (e.g., build and validate time series forecast models); and other, more focused tutorials on how to use a particular R package or function (e.g., convert datetime formats).

Help with R is not hard to find. For example, the R-bloggers website (r-bloggers.com) lists tutorials and news related to R. A popular source for troubleshooting and for finding example code is Stack Overflow (https://stackoverflow.com/questions/tagged/r), a question and answer site that is a rich resource for R-related information. Clinical laboratorians also can explore a number of books for learning R (Table 2). R for Data Science by Hadley Wickham comprises the foundation of many introductory level short courses and online resources. It is considered a contemporary must-read for those beginning to learn R.

A list of books for learning R

Content geared specifically for laboratory medicine professionals is also available with more and more being developed over time. In recent years, AACC and other professional societies have offered short courses designed for learners with varying levels of R experience, and several are planned for the 2020 AACC Annual Scientific Meeting.

Content in these sessions often covers method validation, instrument interfacing, and test utilization reporting. More advanced topics on predictive modeling using laboratory results and database integration have also been presented.

Data Analytics Is in Your Future

We believe that clinical laboratories will require increasing use of data analytics to optimize operations, manage utilization, and provide improved interpretation of complex laboratory data in the context of patients’ medical records. For laboratorians to embrace and thrive in this future, we will need improved tools to process the rapidly changing streams of data that we produce. R provides an excellent format for learning about and, ultimately, implementing the types of computational tools required in a new era of laboratory medicine. Importantly, the skills and computational thinking that a laboratorian acquires by using R also readily translate to other programming languages and informatic approaches. R provides an ideal tool for clinical laboratorians to embrace our data-oriented future.

Shannon Haymond, PhD, DABCC, FAACC, is vice chair for computational pathology at Ann and Robert H. Lurie Children’s Hospital of Chicago and associate professor of pathology at Northwestern University Feinberg School of Medicine. +Email: shaymond@luriechildrens.org

Stephen Master, MD, PhD, FAACC, is division chief for laboratory medicine at Children’s Hospital of Philadelphia (CHOP) and associate professor of pathology at the Perelman School of Medicine, University of Pennsylvania. He also serves as director of the Michael Palmieri Laboratory for Metabolic and Advanced Diagnostics at CHOP and holds a joint appointment in the division of pathology informatics. +Email: masters@email.chop.edu


1. Holmes DT, Buhr KA. Widespread Incorrect Implementation of the Hoffmann Method, the Correct Approach, and Modern Alternatives. Am J Clin Pathol 2019; 151:328-36.
2. Wickham H, Averick M, Bryan J, et al. Welcome to the tidyverse. J Open Source Softw 2019; 4: 1686.
3. Raess PW, van de Geijn GJ, Njo TL, et al. Am J Hematol 2014; 89:369-74.


Source link

Leave a Reply

Your email address will not be published.

Previous Post

Google My Business: When should you use ‘special hours’ vs. ‘temporarily closed’?

Next Post

What is SEO and How it Works

Related Posts