<?xml version="1.0" encoding="UTF-8" standalone="yes"?><rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Quantargo Blog</title><link>https://www.quantargo.com/blog</link><description>Recent content on Quantargo Blog</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Wed, 22 Apr 2026 02:33:32 +0000</lastBuildDate><atom:link href="/index.xml" rel="self" type="application/rss+xml"/><item><title>Vienna&lt;-R 2022 November Meetup (live/virtual)</title><link>https://www.quantargo.com/blog/2022-11-08-viennar-meetup-pdfmole-holiglm</link><pubDate>Tue, 08 Nov 2022 22:00:00 +0000</pubDate><guid>https://www.quantargo.com/blog/2022-11-08-viennar-meetup-pdfmole-holiglm</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Vienna&lt;-R 2022 November Meetup (live/virtual)&lt;/h2&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2022-11-08-viennar-meetup-pdfmole-holiglm/og.png"&gt;
&lt;p&gt;After a longer COVID break we are happy to announce the upcoming ViennaR Meetup on Thursday, November 10! 🙌🎉🥳&lt;/p&gt;
&lt;p&gt;The (live) Meetup is hosted at TU Vienna, the legendary &lt;a href="https://goo.gl/maps/o4xndShSzwEN1LQu7"&gt;Goldenes Lamm&lt;/a&gt;, Seminarraum 107/1 - where some &lt;a href="http://www.ci.tuwien.ac.at/"&gt;R-Core magic happened&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;👉&lt;a href="https://www.meetup.com/viennar/events/289309706"&gt;REGISTER FOR LIVE MEETUP&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Note, that this meetup is hybrid and also available virtually via Zoom. Please register separately at &lt;a href="https://www.meetup.com/viennar/events/289309745/" class="uri"&gt;https://www.meetup.com/viennar/events/289309745/&lt;/a&gt; in case you want to attend virtually. International guests welcome!&lt;/p&gt;
&lt;p&gt;👉&lt;a href="https://www.meetup.com/viennar/events/289309706"&gt;REGISTER FOR VIRTUAL MEETUP&lt;/a&gt;&lt;/p&gt;
&lt;div id="agenda" class="section level3"&gt;
&lt;h3&gt;AGENDA&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;18:00 Doors Open&lt;/li&gt;
&lt;li&gt;18:15 Introduction (15min, Start of Virtual Meetup)&lt;/li&gt;
&lt;li&gt;18:30 pdfmole - Extracting Tables from PDF files (Florian Schwendinger)&lt;/li&gt;
&lt;li&gt;19:15 holiglm - Holistic Generalized Linear Models (Benjamin Schwendinger)&lt;/li&gt;
&lt;li&gt;20:00 (End)&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id="details" class="section level3"&gt;
&lt;h3&gt;DETAILS&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;pdfmole&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To read-in the data either&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/FlorianSchwendinger/pdfminer"&gt;pdfminer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://gitlab.com/schwe/pdfboxr"&gt;pdfboxr&lt;/a&gt; or&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://CRAN.R-project.org/package=tesseract"&gt;tesseract&lt;/a&gt; can be used.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In principle, any package which returns the data in a similar format could be used. The packages &lt;strong&gt;pdfminer&lt;/strong&gt; and &lt;strong&gt;pdfboxr&lt;/strong&gt; can be used if the PDF-file store already the text (in most cases) if the PDF contains only images of the tables &lt;strong&gt;tesseract&lt;/strong&gt; can be used.&lt;/p&gt;
&lt;p&gt;👉&lt;a href="https://github.com/FlorianSchwendinger/pdfmole"&gt;Github&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;holiglm&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Holistic linear regression extends the classical best subset selection problem by adding additional constraints designed to improve the model quality. These constraints include sparsity-inducing constraints, sign-coherence constraints and linear constraints. The R package holiglm provides functionality to model and fit holistic generalized linear models. By making use of state-of-the-art conic mixed-integer solvers, the package can reliably solve GLMs for Gaussian, binomial and Poisson responses with a multitude of holistic constraints. The high-level interface simplifies the constraint specification and can be used as a drop-in replacement for the &lt;code&gt;stats::glm()&lt;/code&gt; function.&lt;/p&gt;
&lt;p&gt;👉&lt;a href="https://github.com/FlorianSchwendinger/pdfmole"&gt;Github&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Please feel free to join the networking session at a pub nearby.&lt;/p&gt;
&lt;p&gt;Greetings,&lt;/p&gt;
&lt;p&gt;Your ViennaR organizers&lt;/p&gt;
&lt;p&gt;👉&lt;a href="https://www.meetup.com/viennar/events/289309706"&gt;REGISTER FOR LIVE MEETUP&lt;/a&gt; 👉&lt;a href="https://www.meetup.com/viennar/events/289309706"&gt;REGISTER FOR VIRTUAL MEETUP&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Make code, not war! ✌❤️&lt;/p&gt;
&lt;/div&gt;</description></item><item><title>Dashboard Framework Part 2: Running Shiny in AWS Fargate with CDK</title><link>https://www.quantargo.com/blog/2022-03-11-creating-dashboard-framework-with-aws-part2</link><pubDate>Fri, 11 Mar 2022 12:00:00 +0000</pubDate><guid>https://www.quantargo.com/blog/2022-03-11-creating-dashboard-framework-with-aws-part2</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Dashboard Framework Part 2: Running Shiny in AWS Fargate with CDK&lt;/h2&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2022-03-11-creating-dashboard-framework-with-aws-part2/og.png"&gt;
&lt;p&gt;In the &lt;a href="https://www.quantargo.com/blog/2022-03-07-creating-dashboard-framework-with-aws"&gt;previous post&lt;/a&gt; we outlined the architecture of a dashboard framework to run dashboards based on multiple technologies including Shiny and Flask in production. We will now show how to run a basic Shiny dashboard in AWS Fargate behind an Application Load Balancer in less than 60 lines of CDK code. To define our stack in a reproducible manner we will make use of the Amazon Cloud Development Kit (CDK) with Typescript. Starting from a basic CDK stack we now specify the most important components of our stack:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;The Application Load Balancer (ALB) to route traffic to our dashboards.&lt;/li&gt;
&lt;li&gt;The Fargate cluster to run our dashboard tasks in a scalable manner.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The deployed stack will finally run an example Shiny dashboard behind an Application Load Balancer. Note that the resulting stack will only run one dashboard without encryption. We’ll implement these features as part of the next post. The resulting CDK code can also be downloaded from Github at &lt;a href="https://github.com/quantargo/dashboards" class="uri"&gt;https://github.com/quantargo/dashboards&lt;/a&gt;.&lt;/p&gt;
&lt;div id="prerequisites" class="section level3"&gt;
&lt;h3&gt;Prerequisites&lt;/h3&gt;
&lt;p&gt;To run the following code examples make sure to have&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;an &lt;a href="https://aws.amazon.com"&gt;AWS Account&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;a locally configured AWS account by running e.g. &lt;code&gt;aws configure&lt;/code&gt; with the &lt;a href="https://aws.amazon.com/cli/"&gt;aws CLI&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;a local &lt;a href="https://nodejs.org/en/download"&gt;Node.js installation&lt;/a&gt; (version &amp;gt;= 14.15.0)&lt;/li&gt;
&lt;li&gt;Typescript: &lt;code&gt;npm -g install typescript&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/cdk"&gt;CDK&lt;/a&gt; (version &amp;gt;= 2.0): &lt;code&gt;npm install -g aws-cdk&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;div id="initialize-cdk-and-deploy-first-app" class="section level3"&gt;
&lt;h3&gt;Initialize CDK and Deploy first App&lt;/h3&gt;
&lt;p&gt;To initialize a sample project we first create a project folder and within the folder execute &lt;code&gt;cdk init&lt;/code&gt;:&lt;/p&gt;
&lt;/div&gt;
&lt;pre&gt;mkdir dashboards
cd dashboards
cdk init app --language typescript&lt;/pre&gt;
&lt;div id="section-1" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;This command creates a new CDK Typescript project and installs all required packages. The following 2 files are relevant for stack development:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;bin/dashboards.ts&lt;/code&gt;: Main file which initializes CDK stack class. You can explicitly set the environment &lt;code&gt;env&lt;/code&gt; if you use a different account or region for deployment.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;lib/dashboards-stack.ts&lt;/code&gt;: CDK Stack class to which all components of our stack will be added.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id="specify-application-load-balancer-alb" class="section level3"&gt;
&lt;h3&gt;Specify Application Load Balancer (ALB)&lt;/h3&gt;
&lt;p&gt;Next, we need to create an Application Load Balancer (ALB) within a new VPC which is responsible for secure connections and routing. We create a new VPC and add an &lt;code&gt;internetFacing&lt;/code&gt; load balancer to it. This means that the load balancer will be accessible from the public internet and will therefore be placed into a public subnet. Within the &lt;code&gt;lib/dashboards-stack.ts&lt;/code&gt; file we put the following lines:&lt;/p&gt;
&lt;/div&gt;
&lt;pre&gt;// Put imports on top of the file
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as elbv2 from 'aws-cdk-lib/aws-elasticloadbalancingv2'

// Put below lines within the DashboardsStack constructor
const vpc = new ec2.Vpc(this, 'MyVpc');

const lb = new elbv2.ApplicationLoadBalancer(this, 'LB', {
  vpc: vpc,
  internetFacing: true,
  loadBalancerName: 'DashboardBalancer'
});&lt;/pre&gt;
&lt;div id="specify-dashboard-cluster" class="section level3"&gt;
&lt;h3&gt;Specify Dashboard Cluster&lt;/h3&gt;
&lt;p&gt;Next, we need to add an ECS cluster to our VPC to run our dashboards efficiently:&lt;/p&gt;
&lt;/div&gt;
&lt;pre&gt;// Put imports on top of the file
import * as ecs from 'aws-cdk-lib/aws-ecs'

// Put below lines within the DashboardsStack constructor
const cluster = new ecs.Cluster(this, 'DashboardCluster', {
  vpc: vpc
});&lt;/pre&gt;
&lt;div id="add-first-fargate-task-definition" class="section level3"&gt;
&lt;h3&gt;Add First Fargate Task Definition&lt;/h3&gt;
&lt;p&gt;We can now add our first Fargate dashboard to the cluster by specifying a task definition. We use the &lt;a href="https://hub.docker.com/r/rocker/shiny"&gt;rocker/shiny&lt;/a&gt; Docker container as an example running on port &lt;code&gt;3838&lt;/code&gt;. This also requires respective port mappings in the container definition. Additionally, we use &lt;em&gt;half&lt;/em&gt; a virtual CPU (&lt;code&gt;512&lt;/code&gt;)—&lt;code&gt;1024&lt;/code&gt; would equal a &lt;em&gt;full&lt;/em&gt; one—and a memory size of 1024 MiB. By specifying the Fargate &lt;code&gt;service&lt;/code&gt; we are already finished with the specification to run our first container in the cluster:&lt;/p&gt;
&lt;/div&gt;
&lt;pre&gt;const taskDefinition = new ecs.FargateTaskDefinition(this, 'TaskDefinition', {
  cpu: 512,
  memoryLimitMiB: 1024,
});

const port = 3838

const container = taskDefinition.addContainer('Container', {
  image: ecs.ContainerImage.fromRegistry('rocker/shiny'),
  portMappings: [{ containerPort: port }],
})

const service = new ecs.FargateService(this, 'FargateService', {
  cluster: cluster,
  taskDefinition: taskDefinition,
  desiredCount: 1,
  serviceName: 'FargateService'
})&lt;/pre&gt;
&lt;div id="section-5" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;/div&gt;
&lt;div id="put-service-behind-alb" class="section level3"&gt;
&lt;h3&gt;Put Service Behind ALB&lt;/h3&gt;
&lt;p&gt;Next, we put the Fargate service into an ALB target group so that traffic can be routed through the ALB:&lt;/p&gt;
&lt;/div&gt;
&lt;pre&gt;const tg1 = new elbv2.ApplicationTargetGroup(this, 'TargetGroup', {
  vpc: vpc,
  targets: [service],
  protocol: elbv2.ApplicationProtocol.HTTP,
  stickinessCookieDuration: cdk.Duration.days(1),
  port: port,
  healthCheck: {
    path: '/',
    port: `${port}`
  }
})&lt;/pre&gt;
&lt;div id="section-7" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;Note that we added 2 parameters to the ALB target group definition:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;stickinessCookieDuration&lt;/strong&gt;: Since Shiny sessions are stateful we need to prevent the ALB to switch instances (in case there are more) during a session. The session duration set to one day should be sufficient.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;healthCheck&lt;/strong&gt;: The health check needs to specify the port (as string) and set to the container port 3838, as well.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id="section-8" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;Finally, we add an HTTP listener which directly forwards all incoming traffic to our dashboard:&lt;/p&gt;
&lt;/div&gt;
&lt;pre&gt;const listener = lb.addListener(`HTTPListener`, {
  port: 80,
  defaultAction: elbv2.ListenerAction.forward([tg1]) 
})&lt;/pre&gt;
&lt;div id="section-10" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;/div&gt;
&lt;div id="deploy" class="section level3"&gt;
&lt;h3&gt;Deploy&lt;/h3&gt;
&lt;p&gt;Before deployment you should also bootstrap your CDK environment:&lt;/p&gt;
&lt;/div&gt;
&lt;pre&gt;cdk bootstrap&lt;/pre&gt;
&lt;div id="section-12" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;Now the stack should be ready for deployment. As an extra step, you can now check if the stack can be successfully synthesized using&lt;/p&gt;
&lt;/div&gt;
&lt;pre&gt;cdk synth&lt;/pre&gt;
&lt;div id="section-14" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;Any errors popping up during &lt;code&gt;cdk synth&lt;/code&gt; need to be fixed immediately. By continously using &lt;code&gt;cdk synth&lt;/code&gt; we make sure that the feedback cycles during development are as short as possible. If &lt;code&gt;cdk synth&lt;/code&gt; is successful we can now run&lt;/p&gt;
&lt;/div&gt;
&lt;pre&gt;cdk deploy&lt;/pre&gt;
&lt;div id="section-16" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;Finally, you should see the successful output message including the &lt;code&gt;DashboardsStack.LoadBalancerDNSName&lt;/code&gt; which you can directly access through the browser:&lt;/p&gt;
&lt;/div&gt;
&lt;pre&gt;Outputs:
DashboardsStack.LoadBalancerDNSName = DashboardBalancer-&lt;9-DIGIT-NUMBER&gt;.&lt;region&gt;.elb.amazonaws.com
Stack ARN:
arn:aws:cloudformation:&lt;region&gt;:&lt;AWS-ACCOUNT-NO&gt;:stack/DashboardsStack/&lt;uuid&gt;

✨  Total time: 297.67s&lt;/pre&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2022-03-11-creating-dashboard-framework-with-aws-part2/shiny-app.png"&gt;
&lt;div id="destroy" class="section level3"&gt;
&lt;h3&gt;Destroy&lt;/h3&gt;
&lt;p&gt;If you don’t use the stack any more and to reduce cloud costs just run:&lt;/p&gt;
&lt;/div&gt;
&lt;pre&gt;cdk destroy&lt;/pre&gt;
&lt;div id="conclusion" class="section level3"&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;We could show how to run your first basic Shiny dashboard behind an Application Load Balancer in very few lines of CDK Typescript code. In the next post we will cover end-to-end encryption through SSL/TLS and host-based routing to add multiple dashboards to the ALB.&lt;/p&gt;
&lt;p&gt;Make code, not war! ✌️&lt;/p&gt;
&lt;/div&gt;
&lt;div id="get-in-touch" class="section level3"&gt;
&lt;h3&gt;Get in Touch&lt;/h3&gt;
&lt;p&gt;Interested in creating your own dashboard framework or other data science cloud stacks? Just get in &lt;a href="https://www.quantargo.com/for-business/consulting#contact"&gt;touch&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;E-Mail: &lt;a href="mailto:%20info@quantargo.com"&gt;info@quantargo.com&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id="appendix---full-code" class="section level3"&gt;
&lt;h3&gt;Appendix - Full Code&lt;/h3&gt;
&lt;p&gt;The full CDK code stack for this post is available on &lt;a href="https://github.com/quantargo/dashboards/tree/blog-part2"&gt;Github&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Below you find the full code specifiying the stack from &lt;code&gt;lib/dashboards-stack.ts&lt;/code&gt;:&lt;/p&gt;
&lt;/div&gt;
&lt;pre&gt;import { Stack, StackProps } from 'aws-cdk-lib';
import { Construct } from 'constructs';

import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as elbv2 from 'aws-cdk-lib/aws-elasticloadbalancingv2'
import * as ecs from 'aws-cdk-lib/aws-ecs'
import * as cdk from 'aws-cdk-lib'

export class DashboardsStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);

    const vpc = new ec2.Vpc(this, 'MyVpc');

    const lb = new elbv2.ApplicationLoadBalancer(this, 'LB', {
      vpc: vpc,
      internetFacing: true,
      loadBalancerName: 'DashboardBalancer'
    });

    const cluster = new ecs.Cluster(this, 'DashboardCluster', {
      vpc: vpc
    });

    const taskDefinition = new ecs.FargateTaskDefinition(this, 'TaskDefinition', {
      cpu: 512,
      memoryLimitMiB: 1024,
    });

    const port = 3838

    const container = taskDefinition.addContainer('Container', {
      image: ecs.ContainerImage.fromRegistry('rocker/shiny'),
      portMappings: [{ containerPort: port }],
    })
    
    const service = new ecs.FargateService(this, 'FargateService', {
      cluster: cluster,
      taskDefinition: taskDefinition,
      desiredCount: 1,
      serviceName: 'FargateService'
    })

    const tg1 = new elbv2.ApplicationTargetGroup(this, 'TargetGroup', {
      vpc: vpc,
      targets: [service],
      protocol: elbv2.ApplicationProtocol.HTTP,
      stickinessCookieDuration: cdk.Duration.days(1),
      port: port,
      healthCheck: {
        path: '/',
        port: `${port}`
      }
    })

    const listener = lb.addListener(`HTTPListener`, {
      port: 80,
      defaultAction: elbv2.ListenerAction.forward([tg1]) 
    })

    new cdk.CfnOutput(this, 'LoadBalancerDNSName', { value: lb.loadBalancerDnsName });
  }
}&lt;/pre&gt;</description></item><item><title>Creating a Dashboard Framework with AWS (Part 1)</title><link>https://www.quantargo.com/blog/2022-03-07-creating-dashboard-framework-with-aws</link><pubDate>Wed, 09 Mar 2022 12:00:00 +0000</pubDate><guid>https://www.quantargo.com/blog/2022-03-07-creating-dashboard-framework-with-aws</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Creating a Dashboard Framework with AWS (Part 1)&lt;/h2&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2022-03-07-creating-dashboard-framework-with-aws/og.png"&gt;
&lt;p&gt;R-Shiny is an excellent framework to create interactive dashboards for data scientists with no extensive web development experience. Similar technologies in other languages include the Flask, Dash or Streamlit Python frameworks. Bringing all different Dashboards under the hood including unified authentication and user management can be a challenging task. In this blog series we will show how we’ve implemented such a framework with AWS.&lt;/p&gt;
&lt;div id="use-case-and-requirements" class="section level3"&gt;
&lt;h3&gt;Use Case and Requirements&lt;/h3&gt;
&lt;p&gt;The dashboard framework was created for a research department at a major financial institution. Analysts and data scientists already had created dashboards covering different topics based on numerous technologies including R-Shiny and Python-Flask. However, a secure and unified user authentication mechanism is crucial to put the dashboards into production and restrict access only to selected users. Additionally, most analysts and data scientists do not have much dev-ops experience such as Docker containers and thus needed an easy and automated way to adapt their existing dashboards. Last but not least, the team head count was limited on the system operations side, so a simple solution with low maintenance was needed. The entire solution needed to be implemented through Amazon Web Services (AWS) as the cloud provider of choice.&lt;/p&gt;
&lt;p&gt;Based on this situation we were asked to create a dashboard framework architecture with these requirements in mind:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Secure, &lt;strong&gt;end-to-end encrypted&lt;/strong&gt; (SSL, TLS) access to dashboards.&lt;/li&gt;
&lt;li&gt;Secure &lt;strong&gt;authentication&lt;/strong&gt; through E-mail and Single-Sign-On (SSO).&lt;/li&gt;
&lt;li&gt;Horizontal &lt;strong&gt;scalability&lt;/strong&gt; of dashboards according to usage, fail-safe.&lt;/li&gt;
&lt;li&gt;Easy adaptability by analysts through automation and &lt;strong&gt;continuous integration&lt;/strong&gt; (CI/CD).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy maintenance&lt;/strong&gt; and extensibility for system operators.&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;div id="system-architecture" class="section level3"&gt;
&lt;h3&gt;System Architecture&lt;/h3&gt;
&lt;p&gt;All considerations above led to a simple yet effective system architecture based on selected managed AWS services including&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;
&lt;strong&gt;Application Load Balancer&lt;/strong&gt; (ALB) to handle secure end-to-end (SSL) encrypted access to the dashboards based on different host names (host-based-routing).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Cognito&lt;/strong&gt; for user authentication based on E-mail and SSO through Ping Federate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Fargate&lt;/strong&gt; for horizontal scalability, fail-safe operations and easy maintenance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Codepipeline&lt;/strong&gt; and Codebuild for automated build of dashboard Docker containers.&lt;/li&gt;
&lt;li&gt;Extensive usage of managed services requiring low maintenance (Fargate, Cognito, ALB) and &lt;strong&gt;Amazon Cloud Development Kit&lt;/strong&gt; (CDK) to define and manage infrastructure-as-code managed in Git and deployed via Code Pipelines.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The figure below illustrates the resulting architecture in more detail:&lt;/p&gt;
&lt;/div&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2022-03-07-creating-dashboard-framework-with-aws/architecture.png"&gt;
&lt;div id="application-load-balancer" class="section level3"&gt;
&lt;h3&gt;1. Application Load Balancer&lt;/h3&gt;
&lt;p&gt;A central piece of the system architecture is the Application Load Balancer (ALB) to route traffic securely to each dashboard. We configured the ALB with host-based routing, so that requests to e.g. &lt;code&gt;https://dashboard1.domain.com&lt;/code&gt; or &lt;code&gt;https://dashboard2.domain.com&lt;/code&gt; are routed to the respective dashboards. The ALB handles SSL-offloading so that all communication between clients and the load balancer is end-to-end SSL or TLS encrypted. Additionally, we use a &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/listener-authenticate-users.html"&gt;feature of ALB&lt;/a&gt; to authenticate users through a OIDC compliant identity provider, such as Amazon Cognito. Thus, all users without an authentication token are redirected to a login page, as provided by a Cognito Hosted UI. After successful authentication users are allowed to access the respective dashboard of choice.&lt;/p&gt;
&lt;/div&gt;
&lt;div id="aws-cognito-for-user-authentication" class="section level3"&gt;
&lt;h3&gt;2. AWS Cognito for User Authentication&lt;/h3&gt;
&lt;p&gt;We used Cognito as a managed identity provider by AWS supporting all important authentication mechanisms like e-mail/password (plus MFA) and federation providers like Google, Facebook or Apple. Most importantly, Cognito also supports SAML providers like Ping Federate for SSO within large corporations. The login form is also hosted by Cognito and presented to users who have not yet logged into any dashboard:&lt;/p&gt;
&lt;/div&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2022-03-07-creating-dashboard-framework-with-aws/Cognito%20SSO%20Hosted%20UI.png"&gt;
&lt;div id="aws-fargate" class="section level3"&gt;
&lt;h3&gt;3. AWS Fargate&lt;/h3&gt;
&lt;p&gt;All Dashboards are running within Docker containers and hosted as Fargate Tasks within a common cluster. This makes it possible to create dashboards independently from each other including different versions of R (or Python), packages and even operating systems. The pricing of Fargate tasks is comparable to EC2 depending on the CPU/Memory configuration but comes with the advantage of being completely managed. This also makes auto-scaling a breeze which adds new tasks depending on the current workload.&lt;/p&gt;
&lt;/div&gt;
&lt;div id="code-pipeline" class="section level3"&gt;
&lt;h3&gt;4. Code Pipeline&lt;/h3&gt;
&lt;p&gt;Time to market is essential in many industries to get changes and features as fast as possible to the customer. Additionally, many dashboard developers do not want to be occupied with dev-ops tasks like Docker containers and bash scripting. By using Code Pipeline we made sure that dashboard developers only needed to push changes to the repository—the pipeline, builds the docker container, pushes it to elasitic container registry (ECR) and subsequently deploys the new container to the cluster using Code Deploy. The deployment ensures that users have a seamless experience by redirecting new sessions to new instances and dropping old instances once no more open sessions are left.&lt;/p&gt;
&lt;/div&gt;
&lt;div id="cdk-and-cicd" class="section level3"&gt;
&lt;h3&gt;5. CDK and CI/CD&lt;/h3&gt;
&lt;p&gt;The AWS Cloud Development Kit (CDK) was a very important tool to quickly setup the entire stack including infrastructure components, build pipelines, and even domain entries. Typescript was our language of choice since 1. It provides the best support by the community (followed by Python) and 2. CDK is also written in Typescript which makes debugging much easier. Since CDK code gets synthesized to common AWS Cloud Formation Templates through the command &lt;code&gt;cdk synth&lt;/code&gt; developers get immediate feedback if something went wrong and can shorten the feedback cycle. Through &lt;code&gt;cdk deploy&lt;/code&gt; the template can be uploaded and deployed as Cloud Formation Stacks. Thanks to the infrastructure as code principle it is very easy to track changes in Git version control and upload stacks to multiple accounts for development/staging/production.&lt;/p&gt;
&lt;/div&gt;
&lt;div id="conclusion" class="section level3"&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;We could give an overview of the system architecture to deploy a simple yet powerful dashboard framework within Amazon Web Services. The presented framework fulfilled all security requirements and is requires low maintenance efforts thanks to many integrated managed AWS services. In the next post we will show how such a framework can be built from scratch using CDK including dashboard templates for R-Shiny, Flask, Dash and Streamlit.&lt;/p&gt;
&lt;p&gt;Stay tuned! ✌️&lt;/p&gt;
&lt;/div&gt;
&lt;div id="get-in-touch" class="section level3"&gt;
&lt;h3&gt;Get in Touch&lt;/h3&gt;
&lt;p&gt;Interested in creating your own dashboard framework or other data science cloud stacks? Just get in touch:&lt;/p&gt;
&lt;p&gt;E-Mail: &lt;a href="mailto:%20info@quantargo.com"&gt;info@quantargo.com&lt;/a&gt; Contact Form: &lt;a href="https://www.quantargo.com/for-business/consulting#contact"&gt;Link&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;</description></item><item><title>Quantargo Workspace Now Out of Beta</title><link>https://www.quantargo.com/blog/2021-09-27-workspace-launch-announcement</link><pubDate>Tue, 28 Sep 2021 12:30:00 +0000</pubDate><guid>https://www.quantargo.com/blog/2021-09-27-workspace-launch-announcement</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Quantargo Workspace Now Out of Beta&lt;/h2&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-09-27-workspace-launch-announcement/og.png"&gt;
&lt;p&gt;We’re thrilled to announce that Quantargo Workspace is now out of Beta and generally available! Quantargo Workspace lets you easily create and manage data science projects using R and Python, with advanced features like publishing, scheduling and credential management. Get started &lt;strong&gt;&lt;a href="/qbits"&gt;here for free&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;div id="new-features" class="section level3"&gt;
&lt;h3&gt;New Features&lt;/h3&gt;
&lt;p&gt;In tandem with the launch we also added awesome new features which enable a host of new use-cases:&lt;/p&gt;
&lt;p&gt;📝 &lt;strong&gt;Publishing&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Publishing makes it dead simple to quickly share outputs of your workspace like reports, plots or data sets. Simply hit the “Publish” button and let the magic happen: the file is executed and all outputs are automatically published to a unique URL that you can share! This URL is always up-to-date, so if you re-publish your file the publication will reflect this automatically. This works with any R or Python code as well as RMarkdown documents!&lt;/p&gt;
&lt;/div&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-09-27-workspace-launch-announcement/publish-button.png"&gt;
&lt;div id="section-1" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;Published outputs can then be viewed and shared via a standalone link:&lt;/p&gt;
&lt;/div&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-09-27-workspace-launch-announcement/publish-output.png"&gt;
&lt;div id="section-3" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;hr&gt;
&lt;p&gt;⏱️ &lt;strong&gt;Scheduling&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;You can now create schedules from a new panel in the workspace editor. Schedules allow you to automate tedious tasks like report generation and data aggregation by running your code in regular intervals. Different intervals are supported like daily, weekly and monthly:&lt;/p&gt;
&lt;/div&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-09-27-workspace-launch-announcement/scheduling.png"&gt;
&lt;div id="section-5" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;You can create multiple schedules, each with different intervals and times. This makes it a perfect for report generation and together with Auto-Publish you get an always up-to-date link for your reports. Scheduling has been in the works for quite some time and it is finally ready, so please try it out and let us know what you think!&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;🔑 &lt;strong&gt;Credential Management&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;With this latest addition, you can now store confidential credentials like API keys and service credentials. Secrets allow you to securely store and use secrets in your code without exposing them. They are encrypted at rest and never shared.&lt;/p&gt;
&lt;/div&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-09-27-workspace-launch-announcement/secrets.png"&gt;
&lt;div id="section-7" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;Together with scheduling this allows you to securely connect to third party APIs. Check out the new &lt;a href="/qbits/qbit-template-r-twitter-plot"&gt;Twitter Bot template&lt;/a&gt; for how to connect to the Twitter API through Quantargo Workspace.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="/qbits"&gt;➡️ Get Started for Free Now&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Limited time coupon code for our Developer and PRO plans: Use the code &lt;code&gt;FREEWORKSPACE&lt;/code&gt; at checkout to get the first month completely free! Our paid plans allow you to create private workspaces and as well as give you a lot more API calls.&lt;/p&gt;
&lt;p&gt;That’s it for now. Stay safe and healthy! ✌️😃&lt;/p&gt;
&lt;/div&gt;</description></item><item><title>Data Science Conference Austria 2021</title><link>https://www.quantargo.com/blog/2021-09-23-dsc-austria-announcement</link><pubDate>Thu, 23 Sep 2021 17:00:00 +0000</pubDate><guid>https://www.quantargo.com/blog/2021-09-23-dsc-austria-announcement</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Data Science Conference Austria 2021&lt;/h2&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-09-23-dsc-austria-announcement/og.jpeg"&gt;
&lt;p&gt;Data Science Conference (DSC) Austria is knocking on YOUR door, this time the theme is AI powered sustainability: Save the world through data! And the best is—we still have free tickets until Sept 25, so be quick! 👌💪🤞&lt;/p&gt;
&lt;p&gt;DSC Austria will happen on September 27-28th and during the event, you will get a chance to listen to over 3 Keynotes, 25 high-quality talks and 6 tech tutorials on the topic of Sustainability, AI &amp;amp; ML, Data-Driven Decision Making and Data &amp;amp; AI Literacy—but that’s not all!&lt;/p&gt;
&lt;p&gt;With the DSC Austria ticket you get:&lt;/p&gt;
&lt;p&gt;✅ Full access to DSC Austria 2021 talks and sessions&lt;/p&gt;
&lt;p&gt;✅ Entry to virtual networking sessions&lt;/p&gt;
&lt;p&gt;✅ Online certificate of attendance&lt;/p&gt;
&lt;p&gt;Check it out and reserve your spot:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://austria.datasciconference.com/ticket/"&gt;RESERVE FREE TICKET&lt;/a&gt; • &lt;a href="https://austria.datasciconference.com/schedule"&gt;CHECK FULL PROGRAM&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;

&lt;p&gt;&lt;strong&gt;Introducing the 30 Day Sustainability Data Challenge&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As part of &lt;a href="https://austria.datasciconference.com/schedule/tutorials.html"&gt;Quantargo’s Tech tutorial&lt;/a&gt; on Sept 27 at 9 AM CET we will start the 30 Day Sustainability Data Challenge. The challenge is inspired by the 30 Day Chart Challenge and asks participants to post interesting visualizations covering sustainability on Twitter. Anyone is welcome to contribute, no matter which data source or tool you use.&lt;/p&gt;
&lt;p&gt;The only &lt;strong&gt;rules&lt;/strong&gt; are:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Include the hashtag &lt;a href="https://twitter.com/search?q=%2330DaySustainabilityDataChallenge"&gt;#30DaySustainabilityDataChallenge&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Include a link to the source code of your analysis/visualization.&lt;/li&gt;
&lt;li&gt;And, most importantly, add an interesting visualization, animation or meme on sustainability.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can also consider adding other hashtags like &lt;a href="https://twitter.com/search?q=%23rstats"&gt;#rstats&lt;/a&gt; or &lt;a href="https://twitter.com/search?q=%23sustainability"&gt;#sustainability&lt;/a&gt; to reach more people.&lt;/p&gt;
&lt;p&gt;At the end of the challenge we will sum the number of &lt;strong&gt;likes and retweets&lt;/strong&gt; of each twitter account which participated and posted according to above guidelines. We will also post rankings as the challenge progresses. It is allowed and even encouraged to create scheduled Twitter bots using our Quantargo workspace (see next section).&lt;/p&gt;
&lt;p&gt;🏆🏆🏆 And here come the prices:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;first place&lt;/strong&gt; receives a yearly subscription to our &lt;a href="https://www.quantargo.com/qbits"&gt;Quantargo workspace&lt;/a&gt;, a &lt;a href="https://www.quantargo.com/pricing"&gt;yearly subscription&lt;/a&gt; to all &lt;a href="https://www.quantargo.com/courses"&gt;online courses&lt;/a&gt; and a seat at our next &lt;a href="https://www.quantargo.com/for-business"&gt;Advanced Data Transformation workshop&lt;/a&gt; worth €950 including 4 dates in November and lifetime access to all materials. Yes, we will also send a Quantargo goodie–bag including sweets and LOTs of hex–stickers.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;second and third place&lt;/strong&gt; gets a seat at the &lt;a href="https://www.quantargo.com/for-business"&gt;Advanced Data Transformation workshop&lt;/a&gt; and a yearly subscription to our &lt;a href="https://www.quantargo.com/qbits"&gt;Quantargo workspace&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The fourth and fifth place get a yearly subscription to the &lt;a href="https://www.quantargo.com/qbits"&gt;Quantargo workspace&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Quantargo Workspace&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In the tech tutorial during the conference on Sept 27 at 9 AM CET, we will also introduce the brand new scheduling and (encrypted) secrets features of the &lt;a href="https://www.quantargo.com/qbits"&gt;Quantargo workspace&lt;/a&gt;. With these new features it is very easy to create scheduled R Bots which tweet new messages at a specified time and interval. We will show some examples of how to create bots tweeting about sustainability.&lt;/p&gt;
&lt;p&gt;Additionally, the workspace is great to seamlessly create APIs. We will show an example covering and &lt;a href="http://insideairbnb.com/get-the-data.html"&gt;AirBnB dataset&lt;/a&gt; in Vienna to&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Create an XGBoost model using tidymodels to predict apartment prices based attributes in listings.&lt;/li&gt;
&lt;li&gt;Use the model to find cheap apartments in Vienna and plot them with the &lt;a href="https://CRAN.R-project.org/package=leaflet"&gt;leaflet&lt;/a&gt; package.&lt;/li&gt;
&lt;li&gt;Use the API to programmatically create plots and find apartments in that area.&lt;/li&gt;
&lt;/ol&gt;

&lt;img src="https://cdn.quantargo.com/assets/blog/2021-09-23-dsc-austria-announcement/airbnb_image.png"&gt;
&lt;p&gt;&lt;a href="https://www.quantargo.com/qbits/qbit-example-airbnb-locations-vienna?panel=viewer"&gt;CHECK AIRBNB WORKSPACE&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;So stay tuned and healthy, see you at the conference and happy to see your posts! &lt;a href="https://twitter.com/search?q=%2330DaySustainabilityDataChallenge"&gt;#30DaySustainabilityDataChallenge&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>The Elon Musk Tweet Effect on Dogecoin (DOGE)</title><link>https://www.quantargo.com/blog/2021-07-15-elon-musk-tweet-effect-dodge-coin</link><pubDate>Fri, 16 Jul 2021 19:00:00 +0000</pubDate><guid>https://www.quantargo.com/blog/2021-07-15-elon-musk-tweet-effect-dodge-coin</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;The Elon Musk Tweet Effect on Dogecoin (DOGE)&lt;/h2&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-07-15-elon-musk-tweet-effect-dodge-coin/og.png"&gt;
&lt;blockquote&gt;
&lt;p&gt;Unveil the Dogefather&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a href="https://twitter.com/elonmusk"&gt;Elon Musk&lt;/a&gt; is known for his regular tweets about many different topics—in particular his companies &lt;a href="https://twitter.com/tesla"&gt;Tesla&lt;/a&gt; and &lt;a href="https://twitter.com/spacex"&gt;SpaceX&lt;/a&gt;. With close to 60 million followers he truly is a Twitter celebrity and his opinions have a big impact on technologies and companies. Most recently his tweets also covered &lt;a href="https://twitter.com/dogecoin"&gt;Dogecoin&lt;/a&gt;, a crypto currency featuring a dog. With a little R-code we checked the effect of his tweets on the Dodgecoin price and discovered significant spikes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ingredients&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As data sets we use the tweet timeline from @&lt;a href="https://twitter.com/elonmusk"&gt;elonmusk&lt;/a&gt;. With the &lt;a href="https://CRAN.R-project.org/package=rtweet"&gt;rtweet&lt;/a&gt; package the timeline can be downloaded as&lt;/p&gt;
&lt;pre&gt;library(rtweet)

## Visit https://developer.twitter.com/ to get access key
token &lt;- create_token(
  app = "YouAppName",
  consumer_key = "&lt;CONSUMER-KEY&gt;",
  consumer_secret = "&lt;CONSUMER-SECRET&gt;",
  access_token = "&lt;ACCESS-TOKEN&gt;",
  access_secret = "&lt;ACCESS-SECRET&gt;"
  )

tmls &lt;- get_timelines("elonmusk",
  n = 3200,
  token = token)&lt;/pre&gt;
&lt;p&gt;Note, that the token creation can be a bit tricky since you first need to register an App at the Twitter developer page &lt;a href="https://developer.twitter.com" class="uri"&gt;https://developer.twitter.com&lt;/a&gt;. It’s important to fill-in not only the &lt;code&gt;consumer_key/consumer_secret&lt;/code&gt; but also the &lt;code&gt;access_token/access_secret&lt;/code&gt; to successfully create the token. After the successful retrieval we get a data frame with (an excerpt) of the tweets from the @&lt;a href="https://twitter.com/elonmusk"&gt;elonmusk&lt;/a&gt; timeline:&lt;/p&gt;
&lt;pre&gt;library(dplyr)

tmls %&gt;%
  select(created_at, text)&lt;/pre&gt;
&lt;pre&gt;# A tibble: 600 x 2
   created_at          text                                          
   &lt;dttm&gt;              &lt;chr&gt;                                         
 1 2021-07-13 03:05:20 "those who attack space\nmaybe don’t realize …
 2 2021-07-13 02:39:11 "@Rogozin 👏👏"                               
 3 2021-07-13 02:37:57 "Loki is pretty good. Basically, live-action …
 4 2021-07-13 02:33:53 "@dogeofficialceo 🤣"                         
 5 2021-07-13 02:33:26 "@CGDaveMac Maybe if it sees a Shiba Inu, the…
 6 2021-07-13 02:30:16 "🤯 https://t.co/Z11qszTY4v"                  
 7 2021-07-12 22:18:34 "@OwenSparks_ @jeremyjudkins Haha Buzz Corp –…
 8 2021-07-12 22:07:39 "@ErcXspace @kimpaquette Interesting idea"    
 9 2021-07-12 21:40:43 "@kimpaquette Not yet, but they will. It’s ne…
10 2021-07-12 21:38:29 "@cleantechnica OPP? https://t.co/muZdxKdUXz" 
# … with 590 more rows&lt;/pre&gt;
&lt;p&gt;We executed &lt;code&gt;get_timelines()&lt;/code&gt; multiple times to get most tweets out of the timeline.&lt;/p&gt;
&lt;p&gt;To study the price effect of his tweets the &lt;a href="https://github.com/daroczig/binancer"&gt;binancer&lt;/a&gt; package was used to download intraday open-high-low-close (OHCL) data in 1-minute intervals from the &lt;a href="https://www.binance.com"&gt;Binance&lt;/a&gt; crytocurrency exchange. We decided on the Dodge vs. Bitcoin (DOGE/BTC) currency pair to also adjust for the overall market movements and to better see the price effect. The function &lt;code&gt;binance_klines()&lt;/code&gt; returns a data.table containing all intraday pricing data:&lt;/p&gt;
&lt;pre&gt;# Install through `remotes::install_github("daroczig/binancer")`
library(binancer) 
binance_klines("DOGEBTC", interval = "1m") %&gt;%
  select(open_time, open, high, low, close, volume)&lt;/pre&gt;
&lt;pre&gt;               open_time     open     high      low    close volume
  1: 2021-07-16 07:51:00 5.78e-06 5.78e-06 5.77e-06 5.78e-06  80201
  2: 2021-07-16 07:52:00 5.78e-06 5.78e-06 5.77e-06 5.77e-06   2337
  3: 2021-07-16 07:53:00 5.78e-06 5.78e-06 5.77e-06 5.78e-06   6161
  4: 2021-07-16 07:54:00 5.77e-06 5.78e-06 5.77e-06 5.78e-06  31250
  5: 2021-07-16 07:55:00 5.78e-06 5.78e-06 5.77e-06 5.77e-06  67220
 ---                                                               
496: 2021-07-16 16:06:00 5.65e-06 5.65e-06 5.63e-06 5.65e-06 145786
497: 2021-07-16 16:07:00 5.65e-06 5.65e-06 5.64e-06 5.64e-06 135197
498: 2021-07-16 16:08:00 5.65e-06 5.65e-06 5.64e-06 5.65e-06  14782
499: 2021-07-16 16:09:00 5.65e-06 5.65e-06 5.64e-06 5.65e-06 254613
500: 2021-07-16 16:10:00 5.65e-06 5.66e-06 5.64e-06 5.66e-06  85440&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Twitter Event Study&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A typical way to study such events on financial markets is to look at the price movements right before and after each tweet happened. Especially the price action around each tweet can give us an indication of its market effect. For this analysis it is critical to correctly join our 2 data sources, containing twitter and price data. We also need to add the relative time (e.g. minutes relative to tweet timestamp) to to the tweet event which can be used as a common scale for plotting. For this task the function &lt;code&gt;find_price_window()&lt;/code&gt; was created to return the relative price changes around a specified &lt;code&gt;date&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;library(lubridate)

find_price_window &lt;- function(date, sym = "DOGEBTC", window_length = 20) {
  date_rounded &lt;- floor_date(date, unit = "minute")
  start_time &lt;- date_rounded - window_length * 60
  end_time &lt;- date_rounded + window_length * 60
  dodgebtc &lt;- binance_klines(sym, interval = '1m', start_time = start_time, end_time = end_time)
  dodgebtc$close_time &lt;- ceiling_date(dodgebtc$close_time, unit = "minute")
  close_zero &lt;- dodgebtc$close[dodgebtc$close_time == date_rounded]
  out &lt;- dodgebtc %&gt;%
    mutate(timediff = difftime(close_time, date_rounded, units = "mins")) %&gt;%
    mutate(price_rel = close/close_zero - 1) %&gt;%
    mutate(date_rounded = date_rounded) %&gt;%
    select(date_rounded, time = timediff, price = price_rel, volume = taker_buy_base_asset_volume) %&gt;%
    arrange(time)
  out
}&lt;/pre&gt;
&lt;p&gt;Now we can create the data table &lt;code&gt;dogetweets&lt;/code&gt; which contains the tweets AND the price action around each tweet by the relative &lt;code&gt;time&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;library(purrr)

dogetweets &lt;- tmls %&gt;%
  filter(grepl("doge", text)) %&gt;%
  mutate(date = as.Date(created_at)) %&gt;% 
  mutate(price_window = map(created_at, find_price_window))  %&gt;%
  mutate(event_num = 1:nrow(.))

dogetweets %&gt;%
  select(created_at, text, price_window)&lt;/pre&gt;
&lt;pre&gt;# A tibble: 12 x 3
   created_at          text                           price_window   
   &lt;dttm&gt;              &lt;chr&gt;                          &lt;list&gt;         
 1 2021-07-13 02:33:53 "@dogeofficialceo 🤣"          &lt;data.table [4…
 2 2021-07-08 22:34:41 "@dogeofficialceo @newscienti… &lt;data.table [4…
 3 2021-07-08 22:32:19 "@dogeofficialceo @newscienti… &lt;data.table [4…
 4 2021-07-01 01:25:05 "@dogeofficialceo @torybruno … &lt;data.table [4…
 5 2021-06-25 02:00:20 "@hiddin2urleft @ItsDogeCoin … &lt;data.table [4…
 6 2021-06-09 20:07:38 "@dogeofficialceo @MattWallac… &lt;data.table [4…
 7 2021-06-05 08:21:59 "@lexfridman @VitalikButerin … &lt;data.table [4…
 8 2021-06-01 23:54:20 "@dogeofficialceo @SouthPark … &lt;data.table [4…
 9 2021-05-25 05:37:12 "@heydave7 @dogecoin_devs Dog… &lt;data.table [4…
10 2021-05-24 19:49:56 "If you’d like to help develo… &lt;data.table [4…
11 2021-05-22 22:07:12 "@flcnhvy @thatdogegirl @What… &lt;data.table [4…
12 2021-05-20 13:57:18 "@thatdogegirl @WhatsupFranks… &lt;data.table [4…&lt;/pre&gt;
&lt;p&gt;We can finally &lt;code&gt;unnest()&lt;/code&gt; the price data from &lt;code&gt;price_window&lt;/code&gt; and create a ggplot, containing the price movements around each elonmusk &lt;code&gt;dodge&lt;/code&gt; tweet:&lt;/p&gt;
&lt;pre&gt;library(ggplot2)
library(tidyr)

dogetweets %&gt;%
  unnest(price_window) %&gt;%
  select(created_at, date_rounded, time, price, volume) %&gt;%
  mutate(time = as.numeric(time)) %&gt;%
  mutate(created_at = format(created_at, "%Y-%m-%d %H:%M:%S")) %&gt;%
  ggplot(mapping = aes(x = time, y = price)) + 
  geom_line(aes(color = created_at, group = created_at)) + 
  scale_y_continuous(labels = scales::percent) + 
  geom_vline(xintercept = 0) + 
  ylab("") + 
  xlab("Minutes to Tweet Creation") + 
  ggtitle("Price Impact DOGE/BTC around @elonmusk Tweet") +
  theme_minimal()&lt;/pre&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-07-15-elon-musk-tweet-effect-dodge-coin/2021-07-15-elon-musk-tweet-effect-dodge-coin_files/figure-html/unnamed-chunk-6-1.png"&gt;
&lt;p&gt;We can also show a table containing the Top-10 tweets by absolute price movement:&lt;/p&gt;
&lt;pre&gt;dogetweets %&gt;%
  unnest(price_window) %&gt;%
  filter(time == 10) %&gt;%
  arrange(desc(abs(price))) %&gt;%
  select(created_at, price, text) %&gt;%
  head(10)&lt;/pre&gt;
&lt;pre&gt;# A tibble: 10 x 3
   created_at             price text                                 
   &lt;dttm&gt;                 &lt;dbl&gt; &lt;chr&gt;                                
 1 2021-05-24 19:49:56  0.0494  "If you’d like to help develop Doge,…
 2 2021-05-25 05:37:12  0.0144  "@heydave7 @dogecoin_devs Doge has d…
 3 2021-05-20 13:57:18  0.0131  "@thatdogegirl @WhatsupFranks @Tesla…
 4 2021-06-01 23:54:20  0.0100  "@dogeofficialceo @SouthPark When I …
 5 2021-06-09 20:07:38  0.00757 "@dogeofficialceo @MattWallace888 No…
 6 2021-07-13 02:33:53  0.00647 "@dogeofficialceo 🤣"                
 7 2021-06-25 02:00:20 -0.00508 "@hiddin2urleft @ItsDogeCoin @Invest…
 8 2021-07-08 22:34:41  0.00321 "@dogeofficialceo @newscientist Kind…
 9 2021-05-22 22:07:12 -0.00221 "@flcnhvy @thatdogegirl @WhatsupFran…
10 2021-06-05 08:21:59  0.00195 "@lexfridman @VitalikButerin @ethere…&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Results&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For some tweets we indeed see a slight price effect for the DOGE/BTC quote on the Binance exchange. The most important tweet which triggered an immediate, positive price reaction by almost 5% versus Bitcoin from our sample seems to be &lt;a href="https://twitter.com/elonmusk/status/1396916392629137409?s=20"&gt;this&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote class="twitter-tweet"&gt;
&lt;p lang="en" dir="ltr"&gt;
If you’d like to help develop Doge, please submit ideas on GitHub &amp;amp; &lt;a href="https://t.co/liAPQMFaQB"&gt;https://t.co/liAPQMFaQB&lt;/a&gt; &lt;a href="https://twitter.com/dogecoin_devs?ref_src=twsrc%5Etfw"&gt;&lt;span class="citation"&gt;@dogecoin_devs&lt;/span&gt;&lt;/a&gt;
&lt;/p&gt;
— Elon Musk (&lt;span class="citation"&gt;@elonmusk&lt;/span&gt;) &lt;a href="https://twitter.com/elonmusk/status/1396916392629137409?ref_src=twsrc%5Etfw"&gt;May 24, 2021&lt;/a&gt;
&lt;/blockquote&gt;
&lt;script async src="https://platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;
&lt;p&gt;&lt;strong&gt;Reproducing Results, QBit Workspace&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If you’re interested in a fully reproducible workspace including all data sets for download check the created Workspace &lt;a href="https://www.quantargo.com/qbits/qbit-example-elon-musk-twitter-analysis"&gt;HERE&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In the next post we will investigate how the sentiment of tweets may affect the price direction of specific markets.&lt;/p&gt;
&lt;p&gt;Happy coding!&lt;/p&gt;</description></item><item><title>Full Workspace Automation through a Programmatic Interface (API) Available Now</title><link>https://www.quantargo.com/blog/2021-07-02-qbits-full-workspace-automation-through-programmatic-interface-api-available-now</link><pubDate>Mon, 05 Jul 2021 15:00:00 +0000</pubDate><guid>https://www.quantargo.com/blog/2021-07-02-qbits-full-workspace-automation-through-programmatic-interface-api-available-now</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Full Workspace Automation through a Programmatic Interface (API) Available Now&lt;/h2&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-07-02-qbits-full-workspace-automation-through-programmatic-interface-api-available-now/og.png"&gt;
&lt;blockquote&gt;
&lt;p&gt;Each workspace already is an API&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;QBit Workspace is a new service to immediately deploy data science results at scale. You can think of it as an online data science editor (like RStudio) which can also be controlled and automated from any programming language through a REST API. Once a workspace has been created—including code, environment objects and files—there is no need for a separate (API) deployment step any more. Each workspace already is an API. With its powerful REST API interface it can be easily embedded into any application, app or programming language without running and managing your own R- or Python server.&lt;/p&gt;
&lt;p&gt;We’re now happy to announce the launch of our &lt;a href="https://www.quantargo.com/qbits"&gt;API service in public beta&lt;/a&gt;, which allows to control every aspect of the workspace programmatically including actions like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Workspace creation&lt;/li&gt;
&lt;li&gt;Workspace deployment&lt;/li&gt;
&lt;li&gt;Code execution&lt;/li&gt;
&lt;li&gt;Rendering of RMarkdown documents&lt;/li&gt;
&lt;li&gt;File up- and downloads&lt;/li&gt;
&lt;li&gt;Package install/remove&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The API thus allows to create completely new use cases which can be easily embedded with any programming language into web applications or mobile apps. No API packages like R &lt;a href="https://CRAN.R-project.org/package=plumber"&gt;plumber&lt;/a&gt; or Python &lt;a href="https://flask.palletsprojects.com"&gt;Flask&lt;/a&gt; are needed!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Get Started with the QBit Workspaces API&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To use the API from R first install the &lt;a href="https://github.com/quantargo/qbit"&gt;qbit package&lt;/a&gt; from the Quantargo Github repository:&lt;/p&gt;
&lt;pre&gt;remotes::install_github("quantargo/qbit")&lt;/pre&gt;
&lt;p&gt;Next, you need to retrieve your free API key from the Quantargo page &lt;a href="https://www.quantargo.com/dashboard/workspaces"&gt;settings section&lt;/a&gt;:&lt;/p&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-07-02-qbits-full-workspace-automation-through-programmatic-interface-api-available-now/api-key-settings.png"&gt;
&lt;p&gt;For more information about API key creation and usage also see our detailed &lt;a href="https://www.quantargo.com/docs/01-qbit-workspace/04-api/section-retrieving-your-api-key"&gt;step-by-step guide&lt;/a&gt;. Ideally, set your API key &lt;code&gt;QKEY&lt;/code&gt; through the &lt;code&gt;options()&lt;/code&gt; settings as&lt;/p&gt;
&lt;pre&gt;options(QKEY = "&lt;ENTER-API-KEY-HERE&gt;")&lt;/pre&gt;
&lt;p&gt;so that all further API calls use the key accordingly. Now you are ready to interact with QBit workspace! As a first example, we’ll show how to create an API-ready RMarkdown report within R.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Creating RMarkdown Documents through the qbit R-API&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;RMarkdown combines markdown text with R outputs (e.g. plots, tables) to create reproducible documents in multiple output formats (e.g. HTML, PDF, Word, Powerpoint, see also &lt;a href="https://www.quantargo.com/blog/2021-06-23-create-rmarkdown-documents-with-qbit-workspace"&gt;here&lt;/a&gt;). Most R-data scientists use their local (RStudio) environment to produce these reports. But what if we want to render these reports through a web application on the fly, maybe even parametrized or with updated input data sets? In the following section we’ll create a QBit workspace for RMarkdown to quickly render an HTML document through the API.&lt;/p&gt;
&lt;p&gt;Let’s start by creating a new workspace based on the RMarkdown template:&lt;/p&gt;
&lt;pre&gt;qbit_id &lt;- qbit::create(qbit_name = "RMarkdown Example HTML document")&lt;/pre&gt;
&lt;pre&gt;qbit_id&lt;/pre&gt;
&lt;pre&gt;[1] "qbit-rmarkdown-example-html-document-eGJWV404T"&lt;/pre&gt;
&lt;p&gt;The created workspace received a new and unique &lt;code&gt;qbit_id&lt;/code&gt; based on its &lt;code&gt;qbit_name&lt;/code&gt; title. You can also visit the &lt;a href="https://www.quantargo.com/qbits/qbit-rmarkdown-example-html-document-eGJWV404T?panel=viewer"&gt;new workspace online&lt;/a&gt; and even share its link with your friends/co-workers. Further changes to your workspace can now be done through the API using the &lt;code&gt;qbit::deploy()&lt;/code&gt; function or directly within the &lt;a href="https://www.quantargo.com/qbits/qbit-rmarkdown-example-html-document-eGJWV404T?panel=viewer"&gt;online editor&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Once you are satisfied with your workspace you can run specific R commands, retrieve their respective outputs and integrate them into your application. Most typically, you might want to execute specific commands like &lt;code&gt;predict()&lt;/code&gt; (for model predictions) or any kinds of user–defined functions through &lt;code&gt;qbit::run&lt;/code&gt;. The &lt;code&gt;qbit::run&lt;/code&gt; interface, which allows to execute any arbitrary R code, is therefore very general and can support any complex API use cases. For our RMarkdown use case we would like to render the &lt;code&gt;main.Rmd&lt;/code&gt; file as an HTML document with &lt;code&gt;qbit::render&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;render_out &lt;- qbit::render(qbit_id)&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;$console_output&lt;/code&gt; element contains a data frame (tibble) of all created contents through the call:&lt;/p&gt;
&lt;pre&gt;render_out$console_output&lt;/pre&gt;
&lt;pre&gt;# A tibble: 8 x 3
  type       content                                          name   
  &lt;chr&gt;      &lt;chr&gt;                                            &lt;chr&gt;  
1 code-input "rmarkdown::render(\"main.Rmd\")"                &lt;NA&gt;   
2 code-mess… "Warning message: \n\nprocessing file: main.Rmd… &lt;NA&gt;   
3 code-outp… "\r  |                                         … &lt;NA&gt;   
4 code-outp… "\r  |                                         … &lt;NA&gt;   
5 code-mess… "Warning message: output file: main.knit.md\n\n" &lt;NA&gt;   
6 code-outp… "/usr/bin/pandoc +RTS -K512m -RTS main.utf8.md … &lt;NA&gt;   
7 code-mess… "Warning message: \nOutput created: main.html\n" &lt;NA&gt;   
8 file       "https://cdn.quantargo.com/assets/user/courses/… main.h…&lt;/pre&gt;
&lt;p&gt;The link of the created Rmarkdown document is located in the row where content &lt;code&gt;type&lt;/code&gt; equals &lt;code&gt;"file"&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;library(dplyr)
render_out$console_output %&gt;%
  filter(type == "file")&lt;/pre&gt;
&lt;pre&gt;# A tibble: 1 x 3
  type  content                                              name    
  &lt;chr&gt; &lt;chr&gt;                                                &lt;chr&gt;   
1 file  https://cdn.quantargo.com/assets/user/courses/b8451… main.ht…&lt;/pre&gt;
&lt;p&gt;The included &lt;a href="https://cdn.quantargo.com/assets/user/courses/b8451061-48a6-475b-95d5-b8bb6ddbaaed/main.html"&gt;link&lt;/a&gt; in the &lt;code&gt;content&lt;/code&gt; column can be easily integrated into your own web application (via an &lt;code&gt;&amp;lt;iframe&amp;gt;&lt;/code&gt; tag) or just downloaded locally (e.g. via &lt;code&gt;download.file()&lt;/code&gt; in R).&lt;/p&gt;
&lt;p&gt;Thanks to the serverless (AWS Lambda) back-end the QBit Workspace is quickly scalable to thousands of concurrent requests. The service is &lt;a href="https://www.quantargo.com/qbits"&gt;now available in public beta&lt;/a&gt; and can be deployed into your own infrastructure (Docker/Container based including Lambda, Kubernetes, Open Shift) &lt;a href="https://www.quantargo.com/contact"&gt;upon request&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Happy deploying!&lt;/p&gt;</description></item><item><title>Create and Preview RMarkdown Documents with QBit Workspace</title><link>https://www.quantargo.com/blog/post/2021-06-23-create-rmarkdown-documents-with-qbit-workspace</link><pubDate>Fri, 25 Jun 2021 16:35:00 +0000</pubDate><guid>https://www.quantargo.com/blog/post/2021-06-23-create-rmarkdown-documents-with-qbit-workspace</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Create and Preview RMarkdown Documents with QBit Workspace&lt;/h2&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-06-23-create-rmarkdown-documents-with-qbit-workspace/og.png"&gt;
&lt;p&gt;RMarkdown is an excellent format to create documents which combine code outputs with text—a programming paradigm called &lt;a href="https://en.wikipedia.org/wiki/Literate_programming"&gt;Literate Programming&lt;/a&gt; first introduced by Donald Knuth. Although RMarkdown documents are mostly used by the R community, preferably within the &lt;a href="https://www.rstudio.com/products/rstudio"&gt;RStudio IDE&lt;/a&gt;, the format is not restricted to the R language. Also other language engines like Python, SQL or Julia can be used with RMarkdown. The current knitr package version 1.33 lists even 44 available engines:&lt;/p&gt;
&lt;pre&gt;names(knitr::knit_engines$get())&lt;/pre&gt;
&lt;pre&gt; [1] "awk"       "bash"      "coffee"    "gawk"      "groovy"   
 [6] "haskell"   "lein"      "mysql"     "node"      "octave"   
[11] "perl"      "psql"      "Rscript"   "ruby"      "sas"      
[16] "scala"     "sed"       "sh"        "stata"     "zsh"      
[21] "highlight" "Rcpp"      "tikz"      "dot"       "c"        
[26] "cc"        "fortran"   "fortran95" "asy"       "cat"      
[31] "asis"      "stan"      "block"     "block2"    "js"       
[36] "css"       "sql"       "go"        "python"    "julia"    
[41] "sass"      "scss"      "R"         "bslib"    &lt;/pre&gt;
&lt;p&gt;Thanks to the pandoc document converter RMarkdown also supports many different &lt;a href="https://rmarkdown.rstudio.com/lesson-9.html"&gt;output formats&lt;/a&gt; which can be set with the &lt;code&gt;output&lt;/code&gt; parameter in the YAML header, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HTML&lt;/strong&gt;: Static HTML files &lt;code&gt;output: html_document&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PDF&lt;/strong&gt;: PDF Documents generated through Latex, &lt;code&gt;output: pdf_document&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Word&lt;/strong&gt;: Microsoft Word documents, &lt;code&gt;output: word_document&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Presentations&lt;/strong&gt;: Presentation formats like MS Powerpoint &lt;code&gt;output: powerpoint_presentation&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dashboards&lt;/strong&gt;: &lt;a href="https://pkgs.rstudio.com/flexdashboard"&gt;flexdashboard&lt;/a&gt; &lt;code&gt;output: flexdashboard::flex_dashboard&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;QBits Workspace facilitates the authoring of RMarkdown documents directly within the browser thanks to instant previews in the &lt;em&gt;Viewer&lt;/em&gt; pane. The instant preview functionality leads to faster development of RMarkdown documents. See below a short presentation of how RMarkdown authoring works:&lt;/p&gt;
&lt;iframe width="672" height="378" src="https://www.youtube.com/embed/ggWQ8Vte5_c" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen&gt;
&lt;/iframe&gt;
&lt;div id="create-new-rmarkdown-document" class="section level3"&gt;
&lt;h3&gt;Create New RMarkdown Document&lt;/h3&gt;
&lt;p&gt;In the &lt;a href="https://www.quantargo.com/dashboard/workspaces"&gt;Workspaces Section&lt;/a&gt; section of your &lt;a href="https://www.quantargo.com/dashboard"&gt;Dashboard&lt;/a&gt; you can create a &lt;strong&gt;New&lt;/strong&gt; Workspace, enter its name and select the RMarkdown template:&lt;/p&gt;
&lt;/div&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-06-23-create-rmarkdown-documents-with-qbit-workspace/rmarkdown_create.png"&gt;
&lt;div id="section-1" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;/div&gt;
&lt;div id="create-html-document" class="section level3"&gt;
&lt;h3&gt;Create HTML Document&lt;/h3&gt;
&lt;p&gt;Set &lt;code&gt;output: html_document&lt;/code&gt; in the YAML header of the document and hit the &lt;strong&gt;Render&lt;/strong&gt; button:&lt;/p&gt;
&lt;/div&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-06-23-create-rmarkdown-documents-with-qbit-workspace/rmarkdown_example_html.png"&gt;
&lt;div id="section-3" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.quantargo.com/qbits/qbit-rmarkdown-example-html-document-eGJWV404T?panel=viewer"&gt;RMarkdown Example HTML document&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id="create-pdf-document" class="section level3"&gt;
&lt;h3&gt;Create PDF Document&lt;/h3&gt;
&lt;p&gt;Set &lt;code&gt;output: pdf_document&lt;/code&gt; and &lt;strong&gt;Render&lt;/strong&gt;:&lt;/p&gt;
&lt;/div&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-06-23-create-rmarkdown-documents-with-qbit-workspace/rmarkdown_example_pdf.png"&gt;
&lt;div id="section-5" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.quantargo.com/qbits/qbit-rmarkdown-example-pdf-document-FEN_bLjx2?panel=viewer"&gt;RMarkdown Example PDF document&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id="create-word-document" class="section level3"&gt;
&lt;h3&gt;Create Word Document&lt;/h3&gt;
&lt;p&gt;Set &lt;code&gt;output: word_document&lt;/code&gt; and &lt;strong&gt;Render&lt;/strong&gt;:&lt;/p&gt;
&lt;/div&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-06-23-create-rmarkdown-documents-with-qbit-workspace/rmarkdown_example_word.png"&gt;
&lt;div id="section-7" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.quantargo.com/qbits/qbit-rmarkdown-example-word-document-S8wQR2rMS?panel=viewer"&gt;RMarkdown Example Word document&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id="create-powerpoint-presentation" class="section level3"&gt;
&lt;h3&gt;Create Powerpoint Presentation&lt;/h3&gt;
&lt;p&gt;Set &lt;code&gt;output: powerpoint_presentation&lt;/code&gt; and &lt;strong&gt;Render&lt;/strong&gt;:&lt;/p&gt;
&lt;/div&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-06-23-create-rmarkdown-documents-with-qbit-workspace/rmarkdown_example_powerpoint.png"&gt;
&lt;div id="section-9" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.quantargo.com/qbits/qbit-rmarkdown-example-powerpoint-presentation-m41NDlPUL?panel=viewer"&gt;RMarkdown Example Powerpoint document&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Give it a try by either creating a new workspace from scratch or by copying one of the existing &lt;a href="https://www.quantargo.com/qbits/explore/tags/rmarkdown"&gt;QBit Workspace examples&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Happy reporting, feedback welcome! ✌️&lt;/p&gt;
&lt;/div&gt;</description></item><item><title>Get your unique certificate with online assessments</title><link>https://www.quantargo.com/blog/post/2021-05-04-get-your-unique-certificate-with-online-assessments</link><pubDate>Thu, 06 May 2021 11:10:00 +0000</pubDate><guid>https://www.quantargo.com/blog/post/2021-05-04-get-your-unique-certificate-with-online-assessments</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Get your unique certificate with online assessments&lt;/h2&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-05-04-get-your-unique-certificate-with-online-assessments/og.png"&gt;
&lt;p&gt;At Quantargo, teaching is a big part of what we do. You can use our platform to dive into new data science skills and to understand previously untouched subjects–all in an easy-to-use and interactive environment. Our online data science courses provide a pre-configured data science environment so that you can focus solely on the content.&lt;/p&gt;
&lt;p&gt;However, for many of you this may not be enough. Assessments are a crucial prerequisite to prove your skills, show your strengths and detect your weak spots. You may need to prove your skills to a third party, such as an institution or an employer, in which case being able to prove your practical and theoretical skills is very important. This is exactly what we’re tackling with this latest update to our course platform, by introducing online assessments.&lt;/p&gt;
&lt;div id="how-it-works" class="section level4"&gt;
&lt;h4&gt;How it works&lt;/h4&gt;
&lt;p&gt;Our course lessons give learners a friendly and interactive environment to dive into new topics. If you forgot something while you’re doing an exercise, you can always go back to get a good grasp of it.&lt;/p&gt;
&lt;p&gt;The new online assessments raise the bar as compared to the course mode. First you need to unlock the assessment by finishing the course. This makes sure you have got all the information you need to pass the test. Then, during the assessment you have only limited time available to finish it. Furthermore, while the assessment is ongoing all course materials are locked, so you can’t cheat by opening the course on a different device. Although your personal &lt;a href="https://www.quantargo.com/blog/2021-03-25-create-your-personal-cheat-sheets"&gt;cheat sheets&lt;/a&gt; are still accessible during the assessment.&lt;/p&gt;
&lt;p&gt;After passing the final course assessment, you get a unique certificate.&lt;/p&gt;
&lt;p&gt;All in all, you really need to know what you’re doing in order to pass it–which is exactly what you’d want from an assessment!&lt;/p&gt;
&lt;/div&gt;
&lt;div id="consolidate-your-knowledge-with-trainings" class="section level4"&gt;
&lt;h4&gt;Consolidate Your Knowledge with Trainings&lt;/h4&gt;
&lt;/div&gt;
&lt;img src="https:/cdn.quantargo.com/assets/blog/2021-05-04-get-your-unique-certificate-with-online-assessments/start-training-assessment.png"&gt;
&lt;div id="section-1" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;The final course assessment unlocks your unique certificate, but also tests your progress for the whole course! Trainings on the other hand allow you to use start assessments &lt;em&gt;per lesson&lt;/em&gt;. This is a perfect way to solidify your knowledge and find weak spots in your understanding the subject matter. All your attempts are saved for easy review afterwards.&lt;/p&gt;
&lt;/div&gt;
&lt;img src="https:/cdn.quantargo.com/assets/blog/2021-05-04-get-your-unique-certificate-with-online-assessments/trainings-overview.png"&gt;
&lt;div id="section-3" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;/div&gt;
&lt;div id="available-now" class="section level3"&gt;
&lt;h3&gt;Available Now&lt;/h3&gt;
&lt;p&gt;Online assessments, as well as trainings, are available for all 15+ courses and lessons. If you’re a PRO subscriber already you can go to &lt;a href="https://www.quantargo.com/dashboard"&gt;your dashboard&lt;/a&gt; right now and start training!&lt;/p&gt;
&lt;p&gt;If you’re just getting started, check out the &lt;strong&gt;FREE introduction course lessons&lt;/strong&gt; &lt;em&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics"&gt;Basics&lt;/a&gt;&lt;/em&gt; and &lt;em&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles"&gt;Data Frames and Tibbles&lt;/a&gt;&lt;/em&gt;. After completion you’ll find the new “Trainings” button enabled in your course dashboard!&lt;/p&gt;
&lt;p&gt;Happy learning ✌️&lt;/p&gt;
&lt;/div&gt;</description></item><item><title>New Course Available Now: Machine Learning with Tidymodels</title><link>https://www.quantargo.com/blog/post/2021-04-08-new-course-machine-learning-with-tidymodels</link><pubDate>Tue, 20 Apr 2021 09:30:00 +0000</pubDate><guid>https://www.quantargo.com/blog/post/2021-04-08-new-course-machine-learning-with-tidymodels</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;New Course Available Now: Machine Learning with Tidymodels&lt;/h2&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-04-08-new-course-machine-learning-with-tidymodels/course_og.png"&gt;
&lt;p&gt;The ever increasing application of machine learning models in industry and academia requires tools which are easy to use and ensure a reliable model fitting process. The R package universe covers practically all statistical models on the planet including all relevant machine learning models like neural nets, support vector machines, decision trees, and random forests. However, most of these packages do not provide a consistent interface, which makes it hard to fit and compare models from different families. Even worse, it is hard to create standardized workflows for typical machine learning projects which ensure that&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;no information has been leaked from the training data, leading to higher performance numbers.&lt;/li&gt;
&lt;li&gt;models are compared on the same re-sampling procedures.&lt;/li&gt;
&lt;li&gt;performance metrics are calculated correctly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;pkg&gt;tidymodels&lt;/pkg&gt; framework is a new package ecosystem, in which all steps of the machine learning workflow are implemented through dedicated R packages. The consistency of these packages ensures their interoperability and ease of use. Most importantly, the framework makes your machine learning workflow &lt;strong&gt;easier to understand&lt;/strong&gt; and &lt;strong&gt;faster to implement&lt;/strong&gt;. &lt;em&gt;tidymodels&lt;/em&gt; should definitely be part of every R data scientist’s tool box. Additionally, it fits perfectly into the &lt;pkg&gt;tidyverse&lt;/pkg&gt; package ecosystem and provides excellent compatibility with packages like &lt;pkg&gt;dplyr&lt;/pkg&gt; or &lt;pkg&gt;ggplot2&lt;/pkg&gt;.&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2021-04-08-new-course-machine-learning-with-tidymodels/lessons.png"&gt;
&lt;div id="section-1" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;Each lesson in the &lt;a href="https://www.quantargo.com/courses/course-r-machine-learning-with-tidymodels"&gt;Machine Learning with Tidymodels&lt;/a&gt; course module covers one essential skill which together completes the entire machine learning workflow:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The tidymodels Machine Learning Workflow&lt;/strong&gt;: Start your machine learning journey and learn the most fundamental building blocks of the tidymodels framework.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Preprocessing with recipes&lt;/strong&gt;: Learn why data preprocessing is crucial in your machine learning workflow and create your first data transformations with the recipes package.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Fitting with parsnip&lt;/strong&gt;: Fit machine learning models using the parsnip package including linear regression, decision trees and boosting trees.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Evaluation and Performance Metrics with yardstick&lt;/strong&gt;: Estimate model quality based on different performance metrics using the yardstick package.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resampling techniques using rsample&lt;/strong&gt;: Avoid overfitting by using resampling techniques including cross-validation and bootstrap using the rsample package.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model optimization using tune&lt;/strong&gt;: Optimize your model parameters using the tune package to find models which predict new data well.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;➔ &lt;a href="https://www.quantargo.com/courses/course-r-machine-learning-with-tidymodels"&gt;&lt;strong&gt;Get started for Free: Machine Learning with Tidymodels&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id="get-your-personalized-cheat-sheets" class="section level3"&gt;
&lt;h3&gt;Get Your Personalized Cheat Sheets&lt;/h3&gt;
&lt;p&gt;With the latest update on our course platform you can create your own personalized cheat-sheets based on your progress. See also this blog post for more information.&lt;/p&gt;
&lt;/div&gt;
&lt;div id="get-your-certificate-with-pro" class="section level3"&gt;
&lt;h3&gt;Get Your Certificate with PRO&lt;/h3&gt;
&lt;p&gt;After completing &lt;em&gt;Machine Learning with Tidymodels&lt;/em&gt; you get a unique certificate, which you can download as PDF and include in your portfolio!&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.quantargo.com/pricing#courses"&gt;Learn more about PRO&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;</description></item><item><title>Free Data Science Training for People with Disabilities</title><link>https://www.quantargo.com/blog/post/2021-04-12-free-training-for-people-with-disabilities</link><pubDate>Tue, 13 Apr 2021 09:30:00 +0000</pubDate><guid>https://www.quantargo.com/blog/post/2021-04-12-free-training-for-people-with-disabilities</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Free Data Science Training for People with Disabilities&lt;/h2&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-04-12-free-training-for-people-with-disabilities/course_og.png"&gt;
&lt;p&gt;At Quantargo we’re on the mission to provide people with the best data science knowledge and (cloud-powered) tools so that they can find new jobs as data scientists, improve their skills in their current roles or do research in a powerful yet reproducible way.&lt;/p&gt;
&lt;p&gt;However, we have realized that more often than not people with disabilities are left out of the equation in many ways. I’m very grateful to &lt;a href="https://www.linkedin.com/in/ivatsolova"&gt;Iva Tsolova&lt;/a&gt; from &lt;a href="https://jambacareers.at"&gt;Jamba&lt;/a&gt; who approached us and explained how she was able to organise trainings and successful job placements for Jamba’s students.&lt;/p&gt;
&lt;p&gt;We are therefore very happy to announce that we are offering—together with &lt;a href="https://jambacareers.at"&gt;Jamba&lt;/a&gt; and &lt;a href="https://ai4da.com"&gt;AI4DA&lt;/a&gt;—a completely free data science training for people with disabilities. The agenda of the trainings, quite similar to our corporate offerings, is centered around one course module with weekly onboarding/mentoring sessions, a final preparation/project session and a final exam.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt; Module &lt;strong&gt;starts on April 22&lt;/strong&gt; at 4 PM CET and continues on a &lt;strong&gt;weekly basis until May 20&lt;/strong&gt;. The live workshops will be held over Zoom.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;There are still FREE SPOTS LEFT&lt;/strong&gt;. If you want to join the training and additionally get a 6 month PRO subscription for free please create an account at &lt;a href="https://www.quantargo.com"&gt;www.quantargo.com&lt;/a&gt; and send us an email to &lt;a href="mailto:courses@quantargo.com"&gt;courses@quantargo.com&lt;/a&gt; with your short resúme until April 20.&lt;/p&gt;
&lt;p&gt;➔ &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;&lt;strong&gt;Get started for Free: Introduction to R&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div id="get-your-personalized-cheat-sheets" class="section level3"&gt;
&lt;h3&gt;Get Your Personalized Cheat Sheets&lt;/h3&gt;
&lt;p&gt;With the newest update on our &lt;a href="https://www.quantargo.com/courses"&gt;course platform&lt;/a&gt; you can create your own personalized cheat-sheets based on your progress. See also this &lt;a href="https://www.quantargo.com/blog/2021-03-25-create-your-personal-cheat-sheets"&gt;blog post&lt;/a&gt; for more information.&lt;/p&gt;
&lt;/div&gt;
&lt;div id="get-your-certificate-with-pro" class="section level3"&gt;
&lt;h3&gt;Get Your Certificate with PRO&lt;/h3&gt;
&lt;p&gt;After completing &lt;em&gt;Machine Learning with Tidymodels&lt;/em&gt; you get a unique certificate, which you can download as PDF and include in your portfolio!&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.quantargo.com/pricing#courses"&gt;Learn more about PRO&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;</description></item><item><title>Create Your Personal Cheat Sheets</title><link>https://www.quantargo.com/blog/post/2021-03-25-create-your-personal-cheat-sheets</link><pubDate>Thu, 25 Mar 2021 12:00:00 +0000</pubDate><guid>https://www.quantargo.com/blog/post/2021-03-25-create-your-personal-cheat-sheets</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Create Your Personal Cheat Sheets&lt;/h2&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-03-25-create-your-personal-cheat-sheets/og.png"&gt;
&lt;p&gt;Cheat Sheets are a handy way to have the most important facts right at your fingertips. Especially when learning new concepts or a whole programming language, cheat sheets can help to stay on top of all the new things you’ve just learned. When talking to learners we immediately sensed their big interest in getting additional materials to better keep track of important key concepts and code patterns.&lt;/p&gt;
&lt;p&gt;Creating good cheat sheets is hard but there are many great examples in the R community. Most famously the ones &lt;a href="https://rstudio.com/resources/cheatsheets"&gt;created by RStudio&lt;/a&gt; should be mentioned here which are uniquely designed and very helpful. We carefully considered all the options concerning cheat sheets for our &lt;a href="/courses"&gt;courses at Quantargo&lt;/a&gt; and ultimately took a quite different route.&lt;/p&gt;
&lt;p&gt;The content in each of our course modules is structured through lessons which are further divided into different chapters. The key concept of each chapter is represented by a so-called &lt;strong&gt;recipe&lt;/strong&gt;, which typically focuses on one code fragment at a time. For example, the recipe in the chapter &lt;a href="/courses/course-r-introduction/04-ggplot/02-scatterplot"&gt;Create a scatter plot with ggplot&lt;/a&gt; consists of the following fragment:&lt;/p&gt;
&lt;pre&gt;library(ggplot2)
ggplot(___) + 
  geom_point(
    mapping = aes(x = ___, y = ___)
  )&lt;/pre&gt;
&lt;p&gt;We took advantage of our unique course structure and now show all of the completed recipes in one unified, interactive view in the course dashboard. This gives you a grand overview of the whole course – your personalized cheat sheet&lt;sup&gt;2&lt;/sup&gt; 😮.&lt;/p&gt;
&lt;img src="https://cdn.quantargo.com/assets/blog/2021-03-25-create-your-personal-cheat-sheets/overview.png"&gt;
&lt;div id="section-1" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;And yes, you can also download the cheat sheets as PDFs which are not only personalized but also reflect your current learning progress. &lt;strong&gt;With this latest update you can now download cheat sheets for all of our 15+ course lessons&lt;/strong&gt;. The current PDF cheat sheets might not compare to hand-crafted ones in terms of design but we think that they help even better to repeat and memorize key concepts. Glad to hear your feedback!&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;If you’re not subscribed to &lt;a href="/pricing"&gt;PRO&lt;/a&gt; yet, the new cheat sheets and course dashboard updates are also available for our free lessons!&lt;/p&gt;
&lt;p&gt;Start your data science now at &lt;a href="/courses"&gt;quantargo.com/courses&lt;/a&gt; and join our community of 2000+ learners. If you have already started this journey head over to your &lt;a href="/dashboard"&gt;course dashboard&lt;/a&gt; to download your new cheat sheets.&lt;/p&gt;
&lt;/div&gt;</description></item><item><title>Inspecting Data Structures</title><link>https://www.quantargo.com/courses/course-r-advanced-data-transformation/02-data-wrangling-with-base-r/01-inspecting-data-structures</link><pubDate>Wed, 17 Mar 2021 20:00:00 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-advanced-data-transformation/02-data-wrangling-with-base-r/01-inspecting-data-structures</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://cdn.quantargo.com/assets/courses/course-r-advanced-data-transformation/02-data-wrangling-with-base-r/01-inspecting-data-structures.png"&gt;
&lt;p&gt;The first step of any data related task is to inspect the data we are dealing with. This is crucial for data wrangling as well, since we need to explore the current structure of the data, in order to identify the required transformations.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Inspect tabular data interactively with &lt;code&gt;View()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Examine the data structure of each object using &lt;code&gt;str()&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;View(___)
str(___)&lt;/pre&gt;
&lt;h2&gt;Interactive Inspection with View()&lt;/h2&gt;
&lt;p&gt;Before starting with any kind of data analysis, it is crucial to understand the data we are dealing with. Plotting is a very important tool to get a quick overview of the statistical properties of data and to detect possible outliers. However, visualization might not always be possible, due to the size or complexity of the data set.&lt;/p&gt;
&lt;p&gt;As an alternative solution, it might be convenient to interactively dig through the data set. This could be done by a spreadsheet-like interface, similar to Microsoft Excel, which enables to filter, sort and inspect tabular data structures.&lt;/p&gt;
&lt;p&gt;R provides the function &lt;code&gt;View()&lt;/code&gt;, which shows an interactive data viewer. Depending on the used platform and editor, this viewer might look differently. Below you can see an example of the &lt;code&gt;View()&lt;/code&gt; function in RStudio:&lt;/p&gt;
&lt;pre&gt;View(gapminder)&lt;/pre&gt;
&lt;img src="https://cdn.quantargo.com/assets/courses/course-r-advanced-data-transformation/02-data-wrangling-with-base-r/images/view_rstudio.png"&gt;
&lt;h2&gt;Quiz: Interactive Inspection with View()&lt;/h2&gt;
Why should you inspect data sets with &lt;code&gt;View()&lt;/code&gt; before starting with your analysis?
&lt;ul&gt;&lt;li&gt;Get a first impression of the data quality.&lt;/li&gt;&lt;li&gt;Find outliers and missing values.&lt;/li&gt;&lt;li&gt;Interactively inspect the data set.&lt;/li&gt;&lt;li&gt;Create reproducible outputs for reports.&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-advanced-data-transformation/02-data-wrangling-with-base-r/01-inspecting-data-structures/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Exercise: Interactive Inspection with View()&lt;/h2&gt;
&lt;p&gt;Use the &lt;code&gt;View()&lt;/code&gt; function on the &lt;code&gt;gapminder&lt;/code&gt; data set and determine the country with the highest life expectancy. Pay also attention to year the projection was made. Set the variables &lt;code&gt;country&lt;/code&gt; and &lt;code&gt;year&lt;/code&gt; accordingly!&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-advanced-data-transformation/02-data-wrangling-with-base-r/01-inspecting-data-structures/exercise-02-01-02"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Examining Data Structures with str()&lt;/h2&gt;
&lt;p&gt;Sometimes we need to analyze very large and complex data structures. Displaying these data sources may already be overwhelming and simply not possible with interactive tools. In these cases, the &lt;code&gt;str()&lt;/code&gt; function comes to the rescue and prints the structure, as well as the first few values of any R object. Even very large and complex data structures can easily be displayed in the console that way.&lt;/p&gt;
&lt;p&gt;As an example, let’s take a look at structure of the &lt;code&gt;TitanicSurvival&lt;/code&gt; data set:&lt;/p&gt;
&lt;pre&gt;library(carData)
str(TitanicSurvival)&lt;/pre&gt;
&lt;pre&gt;'data.frame':   1309 obs. of  4 variables:
 $ survived      : Factor w/ 2 levels "no","yes": 2 2 1 1 1 2 2 1 2 1 ...
 $ sex           : Factor w/ 2 levels "female","male": 1 2 1 2 1 2 1 2 1 2 ...
 $ age           : num  29 0.917 2 30 25 ...
 $ passengerClass: Factor w/ 3 levels "1st","2nd","3rd": 1 1 1 1 1 1 1 1 1 1 ...&lt;/pre&gt;
&lt;p&gt;It consists of three factor columns (&lt;code&gt;survived&lt;/code&gt;, &lt;code&gt;sex&lt;/code&gt; and &lt;code&gt;passengerClass&lt;/code&gt;) and one numeric column &lt;code&gt;age&lt;/code&gt;. Note, that for factor columns both the labels (e.g. &lt;code&gt;"no"&lt;/code&gt;,&lt;code&gt;"yes"&lt;/code&gt;) as well as the integer values are displayed.&lt;/p&gt;
&lt;h2&gt;Quiz: Examining Data Structures with str()&lt;/h2&gt;
In which cases is it benefitial to use the &lt;code&gt;str()&lt;/code&gt; function?
&lt;ul&gt;&lt;li&gt;Get an overview of highly complex data sets.&lt;/li&gt;&lt;li&gt;Create summary statistics describing the data set.&lt;/li&gt;&lt;li&gt;Plot histograms.&lt;/li&gt;&lt;li&gt;Only for &lt;code&gt;data.frames&lt;/code&gt;. &lt;code&gt;str()&lt;/code&gt; can only handle &lt;code&gt;data.frames&lt;/code&gt; and cannot be used for other objects.&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-advanced-data-transformation/02-data-wrangling-with-base-r/01-inspecting-data-structures/quiz-2"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Quiz: Interpret the Output of str()&lt;/h2&gt;
&lt;pre&gt;library(babynames)
str(babynames)&lt;/pre&gt;
&lt;pre&gt;tibble [1,924,665 × 5] (S3: tbl_df/tbl/data.frame)
 $ year: num [1:1924665] 1880 1880 1880 1880 1880 1880 1880 1880 1880 1880 ...
 $ sex : chr [1:1924665] "F" "F" "F" "F" ...
 $ name: chr [1:1924665] "Mary" "Anna" "Emma" "Elizabeth" ...
 $ n   : int [1:1924665] 7065 2604 2003 1939 1746 1578 1472 1414 1320 1288 ...
 $ prop: num [1:1924665] 0.0724 0.0267 0.0205 0.0199 0.0179 ...&lt;/pre&gt;
Examine the output of the &lt;code&gt;str()&lt;/code&gt; function with the &lt;strong&gt;babynames&lt;/strong&gt; dataset above. Which statements about the data set are correct?
&lt;ul&gt;&lt;li&gt;The data set has five rows.&lt;/li&gt;&lt;li&gt;The data set has five columns.&lt;/li&gt;&lt;li&gt;The &lt;code&gt;prop&lt;/code&gt; column is of type &lt;code&gt;numeric&lt;/code&gt;.&lt;/li&gt;&lt;li&gt;The column &lt;code&gt;sex&lt;/code&gt; is of type &lt;code&gt;factor&lt;/code&gt;.&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-advanced-data-transformation/02-data-wrangling-with-base-r/01-inspecting-data-structures/quiz-3"&gt;Start Quiz&lt;/a&gt;
&lt;p&gt;Inspecting Data Structures is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-advanced-data-transformation"&gt;Advanced Data Transformation&lt;/a&gt;, which is available at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-advanced-data-transformation"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Complete the Introduction to Machine Learning Course for Free until March 21</title><link>https://www.quantargo.com/blog/post/2021-03-09-introduction-to-machine-learning-course-free-until-march-21</link><pubDate>Tue, 09 Mar 2021 08:30:00 +0000</pubDate><guid>https://www.quantargo.com/blog/post/2021-03-09-introduction-to-machine-learning-course-free-until-march-21</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Complete the Introduction to Machine Learning Course for Free until March 21&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2021-03-09-introduction-to-machine-learning-course-free-until-march-21/course_og.png"&gt;
&lt;p&gt;To all of you who want to get started with machine learning we have a special offer! Until March 21 you can finish all of the new &lt;a href="https://www.quantargo.com/courses/course-machine-learning-introduction"&gt;Introduction to Machine Learning&lt;/a&gt; course lessons for free and collect the following recipes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.quantargo.com/courses/course-machine-learning-introduction/01-basics/01-what-is-machine-learning"&gt;&lt;strong&gt;What is Machine Learning?&lt;/strong&gt;&lt;/a&gt;: Differentiate between artificial intelligence, machine learning and deep learning. Identify machine learning use cases.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.quantargo.com/courses/course-machine-learning-introduction/01-basics/02-learning-techniques"&gt;&lt;strong&gt;Machine Learning Techniques: Supervised-, unsupervised- and reinforcement learning&lt;/strong&gt;&lt;/a&gt;: Learn about supervised-, unsupervised- and reinforcement learning techniques.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.quantargo.com/courses/course-machine-learning-introduction/01-basics/03-regression-classification"&gt;&lt;strong&gt;Supervised Learning with Regression and Classification&lt;/strong&gt;&lt;/a&gt;: Know what predictors and outcome variables are. See how predictors differ in regression- and classification tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The introduction to machine learning is an ideal preparation for our upcoming &lt;strong&gt;Machine Learning with Tidymodels&lt;/strong&gt; course.&lt;/p&gt;
&lt;p&gt;Stay tuned and have fun with the new course!&lt;/p&gt;
&lt;div id="get-your-free-certificate" class="section level3"&gt;
&lt;h3&gt;Get Your Free Certificate&lt;/h3&gt;
&lt;p&gt;Each lesson covers key concepts in small understandable chunks. After finishing all lessons you receive a unique certificate for completing the course. Download your certificate as a PDF and include it in your portfolio!&lt;/p&gt;
&lt;/div&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2021-03-09-introduction-to-machine-learning-course-free-until-march-21/certificate.png"&gt;
&lt;div id="section-1" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;Happy learning!&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-machine-learning-introduction"&gt;START COURSE&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;</description></item><item><title>The 3 Doors of Data Transformation</title><link>https://www.quantargo.com/courses/course-r-advanced-data-transformation/01-introduction/01-introduction</link><pubDate>Thu, 04 Mar 2021 08:30:00 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-advanced-data-transformation/01-introduction/01-introduction</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-advanced-data-transformation/01-introduction/01-introduction.png"&gt;
&lt;p&gt;This course covers the three most popular package ecosystems for data transformation in R: &lt;strong&gt;base&lt;/strong&gt; R, &lt;strong&gt;tidyverse&lt;/strong&gt; and &lt;strong&gt;data.table&lt;/strong&gt;. You will see which options are better suited for specific use cases in terms of stability, features, speed and consistency.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Get familiar with the main approaches for data handling in R&lt;/li&gt;
&lt;li&gt;Understand the advantages and disadvantages of each option&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-advanced-data-transformation/01-introduction/images/data-wrangling-3doors.png"&gt;
&lt;p&gt;Data can come in many shapes and formats from various sources. The first step before any statistical analysis can be done, is to transform the data to the most suitable format. Depending on the use case, this step might require different packages.&lt;/p&gt;
&lt;p&gt;In R, there exist three different package ecosystems to transform data, namely &lt;strong&gt;base&lt;/strong&gt; R, &lt;strong&gt;tidyverse&lt;/strong&gt; and &lt;strong&gt;data.table&lt;/strong&gt;. Although functions can easily be combined across these ecosystems, it is not always possible due to subtle differences.&lt;/p&gt;
&lt;p&gt;The most important difference lies in the fact, that each ecosystem has its own data frame object defined: data frames, tibbles and data tables. Although tibbles and data tables inherit behavior from their common ancestor data frame, some small differences make them hard to re-use in different ecosystems. Choose your door wisely.&lt;/p&gt;
&lt;h2&gt;The base R Package Ecosystem&lt;/h2&gt;
&lt;p&gt;The &lt;strong&gt;base&lt;/strong&gt; R package is already integrated into the basic R installation. Thus, it can be easily used even within very restrictive IT landscapes. It is also an appropriate choice for environments, where frequent package installations and updates might be unfeasible. &lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;base&lt;/strong&gt; R package has already stood the test of time and is considered to be very stable, with only very few changes even over major version updates. Chances are high, that some dated R code would still work after years, even on different machines or operating systems. &lt;/p&gt;
&lt;p&gt;However, &lt;strong&gt;base&lt;/strong&gt; R does not have the fastest performance for large data sets, compared to other packages and tools. In addition, due to its long history, some &lt;strong&gt;base&lt;/strong&gt; R functions lack consistency and make common workflows harder to integrate. The feature set of &lt;strong&gt;base&lt;/strong&gt; R for data manipulation tasks like joins or reshaping/pivoting, is also lacking behind other packages.&lt;/p&gt;
&lt;p&gt;Since &lt;strong&gt;base&lt;/strong&gt; R is installed on every machine running R, it is important for every data scientist to know its features. Its power might surprise you, and you never know which machine you end up working with.&lt;/p&gt;
&lt;h2&gt;The tidyverse Package Ecosystem&lt;/h2&gt;
&lt;p&gt;The &lt;strong&gt;tidyverse&lt;/strong&gt; package ecosystem provides many packages for data manipulation—most importantly &lt;strong&gt;dplyr&lt;/strong&gt; and &lt;strong&gt;tidyr&lt;/strong&gt;. These packages are well maintained and already widely adopted in the R community. Its clear and consistent syntax makes learning a breeze. Moreover, all common functions (or verbs) can be combined using the pipe &lt;code&gt;%&amp;gt;%&lt;/code&gt; operator.&lt;/p&gt;
&lt;p&gt;The feature set of &lt;strong&gt;tidyverse&lt;/strong&gt; for data reshaping and joins is unparalleled in the R ecosystem. Through extension packages like &lt;strong&gt;dbplyr&lt;/strong&gt; and &lt;strong&gt;sparklyr&lt;/strong&gt;, you can even write queries for database or hadoop cluster back ends. The respective queries get translated for the specific back end.&lt;/p&gt;
&lt;p&gt;On the other hand, &lt;strong&gt;tidyverse&lt;/strong&gt; has many package dependencies and it might be hard to install and maintain these dependencies in specific IT environments and production systems. The &lt;strong&gt;tidyverse&lt;/strong&gt; packages are still subject to change but should become more stable in future versions.&lt;/p&gt;
&lt;h2&gt;The data.table Package Ecosystem&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;data.table&lt;/strong&gt; is a highly optimized, in-memory transformation and query interface for tabular data. It is very well suited for operations like joins, value updates and filters on large tables (e.g. 10M rows+). The main reason for the large speed gains lies in the fact that &lt;strong&gt;data.table&lt;/strong&gt; is very memory-efficient and tries to avoid copies of large tables as much as possible.&lt;/p&gt;
&lt;p&gt;Data tables have some additional features compared to conventional data frames. One can apply data transformation functions directly inside the subset operator &lt;code&gt;[&lt;/code&gt; for example. However, these additional features might lead to constructs which are hard to understand for beginners or non- data table users.&lt;/p&gt;
&lt;p&gt;Data table is still one of the fastest in-memory tabular format on the planet. The &lt;strong&gt;data.table&lt;/strong&gt; function &lt;code&gt;fread()&lt;/code&gt;, is currently the fastest function to read large comma-separated files within R (and also among other languages). The biggest reason for using &lt;strong&gt;data.table&lt;/strong&gt; is simple: &lt;em&gt;speed&lt;/em&gt;.&lt;/p&gt;
&lt;h2&gt;Pros and Cons&lt;/h2&gt;
&lt;p&gt;Depending on the requirements for the use cases, specific package ecosystems stand out against its peers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In terms of &lt;em&gt;stability&lt;/em&gt; of the code (over years), the &lt;strong&gt;base&lt;/strong&gt; R package should be considered.&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;feature set&lt;/em&gt; for data manipulation seems to be broadest in the &lt;strong&gt;tidyverse&lt;/strong&gt; ecosystem.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;data.table&lt;/strong&gt; package is (still) the &lt;em&gt;speed&lt;/em&gt; champion.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Interoperability&lt;/em&gt; and &lt;em&gt;consistency&lt;/em&gt; for different data transformation problems seems to be best handled by the &lt;strong&gt;tidyverse&lt;/strong&gt; ecosystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;table class="table table-striped" style="margin-left: auto; margin-right: auto;"&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th style="text-align:left;"&gt;
&lt;/th&gt;
&lt;th style="text-align:left;"&gt;
base R
&lt;/th&gt;
&lt;th style="text-align:left;"&gt;
tidyverse
&lt;/th&gt;
&lt;th style="text-align:left;"&gt;
data.table
&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align:left;"&gt;
Stability
&lt;/td&gt;
&lt;td style="text-align:left;"&gt;
✅✅
&lt;/td&gt;
&lt;td style="text-align:left;"&gt;
✅
&lt;/td&gt;
&lt;td style="text-align:left;"&gt;
✅
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align:left;"&gt;
Features
&lt;/td&gt;
&lt;td style="text-align:left;"&gt;
🆗
&lt;/td&gt;
&lt;td style="text-align:left;"&gt;
✅✅
&lt;/td&gt;
&lt;td style="text-align:left;"&gt;
✅
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align:left;"&gt;
Speed
&lt;/td&gt;
&lt;td style="text-align:left;"&gt;
❌
&lt;/td&gt;
&lt;td style="text-align:left;"&gt;
✅
&lt;/td&gt;
&lt;td style="text-align:left;"&gt;
✅✅
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align:left;"&gt;
Consistency
&lt;/td&gt;
&lt;td style="text-align:left;"&gt;
❌
&lt;/td&gt;
&lt;td style="text-align:left;"&gt;
✅✅
&lt;/td&gt;
&lt;td style="text-align:left;"&gt;
🆗
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2&gt;Quiz: Which Package Ecosystem to Choose with Storage Backends?&lt;/h2&gt;
Which R package ecosystem shall be chosen if data transformation code needs to be clean, fast and extensible through many storage backends?
&lt;ul&gt;&lt;li&gt;base R&lt;/li&gt;&lt;li&gt;tidyverse&lt;/li&gt;&lt;li&gt;data.table&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-advanced-data-transformation/01-introduction/01-introduction/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Quiz: Which Package Ecosystem to Choose for Stability?&lt;/h2&gt;
Which R package ecosystem shall be chosen if data transformation code shall be very stable and not many features are required?
&lt;ul&gt;&lt;li&gt;base R&lt;/li&gt;&lt;li&gt;tidyverse&lt;/li&gt;&lt;li&gt;data.table&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-advanced-data-transformation/01-introduction/01-introduction/quiz-2"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Quiz: Which Package Ecosystem to Choose for Large Data Sets?&lt;/h2&gt;
Which R package ecosystem shall be chosen if huge data sets need to be processed and therefore maximum performance is required?
&lt;ul&gt;&lt;li&gt;base R&lt;/li&gt;&lt;li&gt;tidyverse&lt;/li&gt;&lt;li&gt;data.table&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-advanced-data-transformation/01-introduction/01-introduction/quiz-3"&gt;Start Quiz&lt;/a&gt;
&lt;p&gt;The 3 Doors of Data Transformation is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>New Course Available Now: Advanced Data Transformation</title><link>https://www.quantargo.com/blog/post/2021-02-26-new-course-advanced-data-transformation</link><pubDate>Fri, 26 Feb 2021 08:30:00 +0000</pubDate><guid>https://www.quantargo.com/blog/post/2021-02-26-new-course-advanced-data-transformation</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;New Course Available Now: Advanced Data Transformation&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2021-02-26-new-course-advanced-data-transformation/course_og.png"&gt;
&lt;p&gt;Data comes in many shapes and forms from all kinds of data sources. The first step before any statistical analysis can be done, is to bring the data into a suitable format. In R, there are three different package ecosystems to transform data, namely &lt;strong&gt;base R&lt;/strong&gt;, &lt;strong&gt;tidyverse&lt;/strong&gt; and &lt;strong&gt;data.table&lt;/strong&gt;.&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2021-02-26-new-course-advanced-data-transformation/lessons.png"&gt;
&lt;div id="section-1" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-advanced-data-transformation"&gt;Advanced Data Transformation&lt;/a&gt; covers the most popular ways of transforming data into all kinds shapes and forms.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;base R&lt;/strong&gt; is already integrated into the R language itself&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;tidyverse&lt;/strong&gt; provides many packages for data manipulation—-most importantly &lt;em&gt;dplyr&lt;/em&gt; and &lt;em&gt;tidyr&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;data.table&lt;/strong&gt; is a highly optimized, in-memory transformation and query interface for tabular data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There is no on-size-fits-all solution to a problem, so in Advanced Data Transformation you will learn how to use the right tool for &lt;em&gt;your&lt;/em&gt; data use cases. For each available package ecosystem it covers all essentials, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data Filtering&lt;/li&gt;
&lt;li&gt;Grouping and Aggregating&lt;/li&gt;
&lt;li&gt;Pivoting&lt;/li&gt;
&lt;li&gt;Joins&lt;/li&gt;
&lt;li&gt;and more!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;➔ &lt;a href="https://www.quantargo.com/courses/course-r-advanced-data-transformation"&gt;&lt;strong&gt;View Course: Advanced Data Transformation&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id="get-your-certificate-with-pro" class="section level3"&gt;
&lt;h3&gt;Get Your Certificate with PRO&lt;/h3&gt;
&lt;p&gt;After completing &lt;em&gt;Advanced Data Transformation&lt;/em&gt; you get a unique certificate, which you can download as PDF and include in your portfolio!&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.quantargo.com/pricing#courses"&gt;Learn more about PRO&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;</description></item><item><title>Complete the Introduction R Course for Free until March 7</title><link>https://www.quantargo.com/blog/post/2021-02-23-introduction-to-r-course-completely-free-until-march-7</link><pubDate>Tue, 23 Feb 2021 08:30:00 +0000</pubDate><guid>https://www.quantargo.com/blog/post/2021-02-23-introduction-to-r-course-completely-free-until-march-7</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Complete the Introduction to R Course for Free until March 7&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2021-02-23-introduction-to-r-course-completely-free-until-march-7/course_og.png"&gt;
&lt;p&gt;To all of you who want to get started with data science and R we have a special offer! Until March 7 you can finish all of the new &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt; course lessons for free and collect the following badges:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics"&gt;&lt;strong&gt;R Basics&lt;/strong&gt;&lt;/a&gt;: Start your R journey and learn the most fundamental building blocks&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles"&gt;&lt;strong&gt;Data Frames and Tibbles&lt;/strong&gt;&lt;/a&gt;: Create tabular data structures with data frames and see how they compare to tibbles.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr"&gt;&lt;strong&gt;Data Transformation with dplyr&lt;/strong&gt;&lt;/a&gt;: Filter rows, select columns and sort/arrange datasets in combination with the pipe &lt;code&gt;%&amp;gt;%&lt;/code&gt; operator.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot"&gt;&lt;strong&gt;Data Visualization with ggplot2&lt;/strong&gt;&lt;/a&gt;: Understand the core principles of creating expressive visualizations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The topics have been carefully selected to give you a contemporary introduction to data science with the R programming language. You will also learn how to create beautiful visualizations, plots and charts!&lt;/p&gt;
&lt;div id="get-your-free-certificate" class="section level3"&gt;
&lt;h3&gt;Get Your Free Certificate&lt;/h3&gt;
&lt;p&gt;Each lesson covers key concepts in small understandable chunks. After finishing all lessons you receive a unique certificate for completing the course. Download your certificate as a PDF and include it in your portfolio!&lt;/p&gt;
&lt;/div&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2021-02-23-introduction-to-r-course-completely-free-until-march-7/introduction-badges.png"&gt;
&lt;div id="section-1" class="section level3"&gt;
&lt;h3&gt;&lt;/h3&gt;
&lt;p&gt;The course gives a friendly introduction into data science topics with 4 in-depth lessons explaining key concepts and getting hands-on with code! After March 7th the course will be still available as part of our new &lt;a href="https://www.quantargo.com/pricing"&gt;PRO subscription&lt;/a&gt;, some chapters will remain free even afterwards.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;&lt;strong&gt;Start Course and Get Certificate&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Happy learning!&lt;/p&gt;
&lt;/div&gt;</description></item><item><title>Create your first bar chart</title><link>https://www.quantargo.com/courses/course-r-introduction/04-ggplot/05-bar-charts</link><pubDate>Tue, 16 Feb 2021 08:30:00 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/04-ggplot/05-bar-charts</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/05-bar-charts.png"&gt;
&lt;ul&gt;
&lt;li&gt;Create your first bar chart using &lt;code&gt;geom_col()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Fill bars with color using the &lt;code&gt;fill&lt;/code&gt; aesthetic&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;ggplot(___) + 
  geom_col(
    mapping = aes(x = ___, y = ___, 
                  fill = ___)
 )&lt;/pre&gt;
&lt;h2&gt;Introduction to bar charts&lt;/h2&gt;
&lt;p&gt;Bar charts visualize &lt;code&gt;numeric&lt;/code&gt; values grouped by categories. Each category is represented by one bar with a height defined by each &lt;code&gt;numeric&lt;/code&gt; value.&lt;/p&gt;
&lt;p&gt;Bar charts are well suited to compare values among different groups e.g. number of votes by parties, number of people in different countries or GDP per capita in different countries. Bar charts are a bit spacious and work best if the number of groups to compare is rather small.&lt;/p&gt;
&lt;p&gt;Below you can find an example showing the number of people (in millions) in the five biggest countries by population in 2007:&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/05-bar-charts_files/figure-html/unnamed-chunk-2-1.png"&gt;
&lt;h2&gt;Creating a simple bar chart&lt;/h2&gt;
&lt;pre&gt;ggplot(___) + 
  geom_col(
    mapping = aes(x = ___, y = ___, 
                  fill = ___)
 )&lt;/pre&gt;
&lt;p&gt;In &lt;strong&gt;ggplot2&lt;/strong&gt;, bar charts are created using the &lt;code&gt;geom_col()&lt;/code&gt; geometric layer. The &lt;code&gt;geom_col()&lt;/code&gt; layer requires the &lt;code&gt;x&lt;/code&gt; aesthetic mapping which defines the different bars to be plotted. The height of each bar is defined by the variable specified in the &lt;code&gt;y&lt;/code&gt; aesthetic mapping. Both mappings, &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt; are required for &lt;code&gt;geom_col()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Let’s create our first bar chart with the &lt;code&gt;gapminder_top5&lt;/code&gt; dataset. It contains population (in millions) and life expectancy data for the biggest countries by population in 2007.&lt;/p&gt;
&lt;pre&gt;ggplot(gapminder_top5) + 
  geom_col(aes(x = country, y = pop))&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/05-bar-charts_files/figure-html/unnamed-chunk-4-1.png"&gt;
&lt;p&gt;We see that the resulting bars are sorted by the country names in alphabetical order by default.&lt;/p&gt;
&lt;h2&gt;Exercise: Plot life expectancy by country&lt;/h2&gt;
&lt;p&gt;Create a bar chart showing the life expectancy of the five biggest countries by population in 2007.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the &lt;code&gt;ggplot()&lt;/code&gt; function and specify the &lt;code&gt;gapminder_top5&lt;/code&gt; dataset as input&lt;/li&gt;
&lt;li&gt;Add a &lt;code&gt;geom_col()&lt;/code&gt; layer to the plot&lt;/li&gt;
&lt;li&gt;Plot one bar for each &lt;code&gt;country&lt;/code&gt; (x aesthetic)&lt;/li&gt;
&lt;li&gt;Use life expectancy &lt;code&gt;lifeExp&lt;/code&gt; as bar height (y aesthetic)&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/05-bar-charts/exercise-05-01"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Filling bars with color&lt;/h2&gt;
&lt;pre&gt;ggplot(___) + 
  geom_col(
    mapping = aes(x = ___, y = ___, 
                  fill = ___)
 )&lt;/pre&gt;
&lt;p&gt;Like other geoms &lt;code&gt;geom_col()&lt;/code&gt; allows users to map additional dataset variables to the color attribute of the bar. The &lt;code&gt;fill&lt;/code&gt; aesthetic can be used to fill the entire bars with color. A usual confusion is the &lt;code&gt;color&lt;/code&gt; aesthetic which specifies the &lt;em&gt;line&lt;/em&gt; color of each bar’s border instead of the &lt;em&gt;fill&lt;/em&gt; color.&lt;/p&gt;
&lt;p&gt;Based on the &lt;code&gt;gapminder_top5&lt;/code&gt; dataset we plot the population (in millions) of the biggest countries and use the &lt;code&gt;continent&lt;/code&gt; variable to color each bar:&lt;/p&gt;
&lt;pre&gt;ggplot(gapminder_top5) + 
  geom_col(aes(x = country, y = pop, fill = continent))&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/05-bar-charts_files/figure-html/unnamed-chunk-6-1.png"&gt;
&lt;p&gt;Since the &lt;code&gt;continent&lt;/code&gt; variable is a categorical variable the bars have a clear color scheme for each continent. Let’s see what happens if we use a &lt;code&gt;numeric&lt;/code&gt; variable like life expectancy &lt;code&gt;lifeExp&lt;/code&gt; instead:&lt;/p&gt;
&lt;pre&gt;ggplot(gapminder_top5) + 
  geom_col(aes(x = country, y = pop, fill = lifeExp))&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/05-bar-charts_files/figure-html/unnamed-chunk-7-1.png"&gt;
&lt;p&gt;The bar colors have now changed according the &lt;strong&gt;continuous&lt;/strong&gt; legend on the right. We see that also &lt;code&gt;numeric&lt;/code&gt; variables can be used to &lt;code&gt;fill&lt;/code&gt; bars.&lt;/p&gt;
&lt;h2&gt;Exercise: Plot population size by country&lt;/h2&gt;
&lt;p&gt;Create a bar chart showing the population (in millions) of the five biggest countries by population in 2007.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the &lt;code&gt;ggplot()&lt;/code&gt; function and specify the &lt;code&gt;gapminder_top5&lt;/code&gt; dataset as input&lt;/li&gt;
&lt;li&gt;Add a &lt;code&gt;geom_col()&lt;/code&gt; layer to the plot&lt;/li&gt;
&lt;li&gt;Plot one bar for each &lt;code&gt;country&lt;/code&gt; (x aesthetic)&lt;/li&gt;
&lt;li&gt;Use population &lt;code&gt;pop&lt;/code&gt; as bar height (y aesthetic)&lt;/li&gt;
&lt;li&gt;Use the GDP per capita &lt;code&gt;gdpPercap&lt;/code&gt; as &lt;code&gt;fill&lt;/code&gt; aesthetic&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/05-bar-charts/exercise-05-02"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Stacked bar charts&lt;/h2&gt;
&lt;pre&gt;ggplot(___) + 
  geom_col(
    mapping = aes(x = ___, y = ___, 
                  fill = ___)
 )&lt;/pre&gt;
&lt;p&gt;In some circumstances it might be useful to plot multiple numeric values variables within each bar. Examples are numeric values describing one specific entity (e.g. customers) split among various categories (customer segments) so that the bar height represents the total number (all customers).&lt;/p&gt;
&lt;p&gt;The plot below shows the number of phones (in thousands) by continent from 1956 to 1961 as a stacked bar chart:&lt;/p&gt;
&lt;pre&gt;ggplot(world_phones) + 
  geom_col(aes(x = year, y = phones,
               fill = region))&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/05-bar-charts_files/figure-html/unnamed-chunk-10-1.png"&gt;
&lt;h2&gt;Exercise: Plot number of crimes by US states&lt;/h2&gt;
&lt;p&gt;Create a bar chart showing the number of crimes by US state per 100,000 residents in 1973.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the &lt;code&gt;ggplot()&lt;/code&gt; function and specify the &lt;code&gt;us_arrests&lt;/code&gt; dataset as input&lt;/li&gt;
&lt;li&gt;Add a &lt;code&gt;geom_col()&lt;/code&gt; layer to the plot&lt;/li&gt;
&lt;li&gt;Plot one bar for each &lt;code&gt;state&lt;/code&gt; (x aesthetic)&lt;/li&gt;
&lt;li&gt;Use the number of &lt;code&gt;cases&lt;/code&gt; as bar height (y aesthetic)&lt;/li&gt;
&lt;li&gt;Use the &lt;code&gt;crime&lt;/code&gt; type as &lt;code&gt;fill&lt;/code&gt; aesthetic.&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/05-bar-charts/exercise-05-04"&gt;Start Exercise&lt;/a&gt;
&lt;p&gt;Create your first bar chart is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Create a line graph with ggplot</title><link>https://www.quantargo.com/courses/course-r-introduction/04-ggplot/04-linegraph</link><pubDate>Sat, 05 Sep 2020 09:56:42 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/04-ggplot/04-linegraph</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/04-linegraph.png"&gt;
&lt;p&gt;Use the &lt;code&gt;geom_line()&lt;/code&gt; aesthetic to draw line graphs and customize its styling using the &lt;code&gt;color&lt;/code&gt; parameter. Specify which coordinates to use for each line with the &lt;code&gt;group&lt;/code&gt; parameter.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Create your first line graph using &lt;code&gt;geom_line()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Define how different lines are connected using the &lt;code&gt;group&lt;/code&gt; parameter&lt;/li&gt;
&lt;li&gt;Change the line color of a line graph using the &lt;code&gt;color&lt;/code&gt; parameter&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;ggplot(___) + 
  geom_line(
    mapping = aes(x = ___, y = ___, 
                  group = ___, 
                  color = ___)
)&lt;/pre&gt;
&lt;h2&gt;Introduction to line graphs&lt;/h2&gt;
&lt;p&gt;Line graphs are used to visualize the trajectory of one numeric variable against another. Unlike scatter plots the x- and y-coordinates are not visualized through points but are instead connected through lines. Line graphs are most typically used if one variable changes &lt;em&gt;continuously&lt;/em&gt; against another numeric variable which is the case for most time series charts (e.g. prices, customers, CO2 concentration, temperature over time), continuous functions (e.g. sine &lt;code&gt;sin(x)&lt;/code&gt;) or other near-continuous relationships (real-world supply/demand curves).&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/04-linegraph_files/figure-html/geom_point-1.png"&gt;
&lt;h2&gt;Quiz: Line Graphs&lt;/h2&gt;
Which of the following statements about line graphs are correct?
&lt;ul&gt;&lt;li&gt;Line graphs are typically used to plot the relationship between categorical and numeric variables.&lt;/li&gt;&lt;li&gt;Line graphs are typically used to plot variables of type &lt;code&gt;numeric&lt;/code&gt;.&lt;/li&gt;&lt;li&gt;For line graphs it is not necessary that the relationship between two variables shows continuity.&lt;/li&gt;&lt;li&gt;Line graphs can be used to plot time series.&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/04-linegraph/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Creating a simple line graph&lt;/h2&gt;
&lt;pre&gt;ggplot(___) + 
  geom_line(
    mapping = aes(x = ___, y = ___, 
                  group = ___, 
                  color = ___)
)&lt;/pre&gt;
&lt;p&gt;Japan is among the countries with the highest life expectancy. Using the &lt;code&gt;gapminder_japan&lt;/code&gt; dataset we determine how the life expectancy in Japan has developed over time. We need to:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Specify the dataset within &lt;code&gt;ggplot()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Define the &lt;code&gt;geom_line()&lt;/code&gt; plot layer&lt;/li&gt;
&lt;li&gt;Map the &lt;code&gt;year&lt;/code&gt; to the x-axis and the life expectancy &lt;code&gt;lifeExp&lt;/code&gt; to the y-axis with the &lt;code&gt;aes()&lt;/code&gt; function&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Note that the &lt;strong&gt;ggplot2&lt;/strong&gt; library needs to be loaded first with &lt;code&gt;library(ggplot2)&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;library(ggplot2)
ggplot(gapminder_japan) + 
  geom_line(
    mapping = aes(x = year, y = lifeExp)
)&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/04-linegraph_files/figure-html/unnamed-chunk-4-1.png"&gt;
&lt;h2&gt;Exercise: Plot life expectancy of Brazil&lt;/h2&gt;
&lt;p&gt;Create your first line graph showing the life expectancy of people from Brazil over time.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the &lt;code&gt;ggplot()&lt;/code&gt; function and specify the &lt;code&gt;gapminder_brazil&lt;/code&gt; dataset as input&lt;/li&gt;
&lt;li&gt;Add a &lt;code&gt;geom_line()&lt;/code&gt; layer to the plot&lt;/li&gt;
&lt;li&gt;Map the &lt;code&gt;year&lt;/code&gt; to the x-axis and the life expectancy &lt;code&gt;lifeExp&lt;/code&gt; to the y-axis with the &lt;code&gt;aes()&lt;/code&gt; function&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/04-linegraph/exercise-04-01"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Adding more lines&lt;/h2&gt;
&lt;pre&gt;ggplot(___) + 
  geom_line(
    mapping = aes(x = ___, y = ___, 
                  group = ___, 
                  color = ___)
)&lt;/pre&gt;
&lt;p&gt;So far we only focused on single lines, but what if we have multiple countries in the dataset and want to somehow differentiate them?&lt;/p&gt;
&lt;p&gt;Line graphs are often extended and used for the comparison of two or more lines. Multiple line graphs show the absolute differences between observations but also how the specific trajectories relate to each other. For example, let’s answer the question: &lt;em&gt;How has life expectancy changed in the countries Austria and Hungary over time?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We first filter the dataset for both countries of interest. Then, we set the variable &lt;code&gt;country&lt;/code&gt; as the &lt;code&gt;group&lt;/code&gt; argument for the aesthetic mapping. The group argument tells ggplot which observations belong together and should be connected through lines. By specifying the &lt;code&gt;country&lt;/code&gt; variable ggplot creates a separate line for each country. To make the lines easier to distinguish we also map &lt;code&gt;color&lt;/code&gt; to the &lt;code&gt;country&lt;/code&gt; so that each country line has a different color.&lt;/p&gt;
&lt;pre&gt;gapminder_comparison &lt;- 
  filter(gapminder, country %in% c("Austria", "Hungary"))

ggplot(data = gapminder_comparison) + 
  geom_line(mapping = aes(x = year, y = lifeExp, 
                          group = country, 
                          color = country)
            )&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/04-linegraph_files/figure-html/group-1.png"&gt;
&lt;p&gt;Note that ggplot also separates the lines correctly if only the &lt;code&gt;color&lt;/code&gt; mapping is specified (the &lt;code&gt;group&lt;/code&gt; parameter is implicitly set).&lt;/p&gt;
&lt;h2&gt;Exercise: Compare life expectancy&lt;/h2&gt;
&lt;p&gt;Create a line graph to compare the life expectancy &lt;code&gt;lifeExp&lt;/code&gt; in the countries Japan, Brazil and India.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the data set &lt;code&gt;gapminder_comparison&lt;/code&gt; in your &lt;code&gt;ggplot()&lt;/code&gt; function which contains only data for the countries &lt;code&gt;Japan&lt;/code&gt;, &lt;code&gt;Brazil&lt;/code&gt; and &lt;code&gt;India&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Create a line graph with the &lt;code&gt;geom_line()&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;Map the &lt;code&gt;year&lt;/code&gt; to the x-axis and the life expectancy &lt;code&gt;lifeExp&lt;/code&gt; to the y-axis with the &lt;code&gt;aes()&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;Map the &lt;code&gt;group&lt;/code&gt; and the &lt;code&gt;color&lt;/code&gt; parameter to the &lt;code&gt;country&lt;/code&gt; variable.&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/04-linegraph/exercise-04-02"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Exercise: Compare populations&lt;/h2&gt;
&lt;p&gt;Compare the population growth over the last decades in the countries Austria, Hungary and Serbia.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the data set &lt;code&gt;gapminder_comparison&lt;/code&gt; in your &lt;code&gt;ggplot()&lt;/code&gt; function which contains only data for the countries in question.&lt;/li&gt;
&lt;li&gt;Create a line graph with &lt;code&gt;geom_line()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Map the &lt;code&gt;year&lt;/code&gt; to the x-axis and the population &lt;code&gt;pop&lt;/code&gt; to the y-axis with &lt;code&gt;aes()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Map the &lt;code&gt;group&lt;/code&gt; and the &lt;code&gt;color&lt;/code&gt; parameter to the &lt;code&gt;country&lt;/code&gt; variable.&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/04-linegraph/exercise-04-03"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Quiz: Malformed Plot&lt;/h2&gt;
&lt;pre&gt;gapminder_comparison &lt;- filter(gapminder, country %in% c("Brazil", "China", "India"))
ggplot(data = gapminder_comparison) + 
  geom_line(mapping = aes(x = year, y = pop))&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/04-linegraph_files/figure-html/unnamed-chunk-6-1.png"&gt;
What has gone wrong in this plot?
&lt;ul&gt;&lt;li&gt;The population numbers are scaled differently in the plotted countries&lt;/li&gt;&lt;li&gt;The &lt;code&gt;group&lt;/code&gt; aesthetic should be used to map the population &lt;code&gt;pop&lt;/code&gt; variable.&lt;/li&gt;&lt;li&gt;The &lt;code&gt;color&lt;/code&gt; aesthetic should be used to map the population &lt;code&gt;lifeExp&lt;/code&gt; variable.&lt;/li&gt;&lt;li&gt;The &lt;code&gt;group&lt;/code&gt; aesthetic should be used to map the &lt;code&gt;year&lt;/code&gt; variable.&lt;/li&gt;&lt;li&gt;The &lt;code&gt;group&lt;/code&gt; aesthetic should be used to map the &lt;code&gt;country&lt;/code&gt; variable.&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/04-linegraph/quiz-2"&gt;Start Quiz&lt;/a&gt;
&lt;p&gt;Create a line graph with ggplot is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Data Science Conference Austria 2020</title><link>https://www.quantargo.com/blog/post/2020-09-04-dsc-austria-2020-announcement</link><pubDate>Sat, 05 Sep 2020 00:00:00 +0000</pubDate><guid>https://www.quantargo.com/blog/post/2020-09-04-dsc-austria-2020-announcement</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Data Science Conference Austria 2020&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2020-09-04-dsc-austria-2020-announcement/dscaustriaheader.png"&gt;
&lt;p&gt;Data Science Conference (DSC) Austria is knocking on YOUR door - and it is all for free! 👌💪🤞&lt;/p&gt;
&lt;p&gt;DSC Austria will happen on September 8-9th and during the event, you will get a chance to listen to over 15 high-quality talks and 8 tech tutorials on the topic of AI &amp;amp; ML, Data-Driven Decision and Data &amp;amp; AI Literacy - but that is not all!&lt;/p&gt;
&lt;p&gt;With the DSC Austria ticket you will get:&lt;/p&gt;
&lt;p&gt;✅ Full access to DSC Austria 2020 talks and sessions&lt;/p&gt;
&lt;p&gt;✅ Entry to virtual networking sessions&lt;/p&gt;
&lt;p&gt;✅ Online certificate of attendance&lt;/p&gt;
&lt;p&gt;Check it out and reserve your spot&lt;/p&gt;
&lt;p&gt;&lt;a href="https://austria.datasciconference.com/ticket/"&gt;RESERVE FREE TICKET&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;DSC Austria 2020 Program&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2020-09-04-dsc-austria-2020-announcement/dscaustriaheader.png"&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2020-09-04-dsc-austria-2020-announcement/philippsinger.jpg"&gt;
&lt;p&gt;On September 8th you are going to listen to 2 Tech Tutorials &amp;amp; 3 Data Discussion. You are going to listen to &lt;em&gt;Use Julia for your Scientific Computing Jobs!&lt;/em&gt; by &lt;a href="https://austria.datasciconference.com/speakers"&gt;Przemyslaw Szufel&lt;/a&gt; from &lt;a href="http://nunatakcapital.com"&gt;Nunatak Capital&lt;/a&gt; and &lt;em&gt;Recommender Systems using Deep Graph Library and Apache MXNet&lt;/em&gt; by &lt;a href="https://austria.datasciconference.com/speakers"&gt;Cyrus Vahid&lt;/a&gt; from &lt;a href="https://aws.amazon.com"&gt;AWS&lt;/a&gt;. Also, you will get a chance to listen to the next data discussions: &lt;em&gt;Are Robo Bankers on our Doorstep?&lt;/em&gt;, &lt;em&gt;May AI be Profitable and Ethical at the Same Time?&lt;/em&gt; and &lt;em&gt;How AI is Fostering Dehumanization of Decision Making?&lt;/em&gt;. Our Panelist will be &lt;a href="https://austria.datasciconference.com/speakers"&gt;Martin Moessler&lt;/a&gt;, &lt;a href="https://austria.datasciconference.com/speakers"&gt;Craig Matthews&lt;/a&gt;, &lt;a href="https://austria.datasciconference.com/speakers"&gt;Georg Koldorfer&lt;/a&gt;, &lt;a href="https://austria.datasciconference.com/speakers"&gt;Aleksandra Przegalinska &amp;amp; Wolfgang Kienreich&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;On September 9th you will be able to listen to 7 excellent talks &amp;amp; participate in 2 networking sessions. We will start our day with the keynote talk &lt;em&gt;Automated Machine Learning for Fast Experiments and Prototypes&lt;/em&gt; delivered by &lt;a href="https://austria.datasciconference.com/speakers/"&gt;Philipp Singer&lt;/a&gt; &amp;amp; &lt;a href="https://austria.datasciconference.com/speakers/"&gt;Dmitry Gordeev&lt;/a&gt;, from &lt;a href="https://www.h2o.ai"&gt;h2o.ai&lt;/a&gt;. After that, you will be able to listen to experts such as &lt;a href="https://austria.datasciconference.com/speakers"&gt;Dragan Pleskonjic&lt;/a&gt;, &lt;a href="https://austria.datasciconference.com/speakers/"&gt;Ronald Hochreiter&lt;/a&gt;, &lt;a href="https://austria.datasciconference.com/speakers"&gt;Valentina Djordjevic&lt;/a&gt; and others.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://austria.datasciconference.com/schedule"&gt;CHECK FULL PROGRAM&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Specify additional aesthetics for points</title><link>https://www.quantargo.com/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics</link><pubDate>Tue, 28 Jul 2020 12:33:47 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics.png"&gt;
&lt;p&gt;&lt;strong&gt;ggplot2&lt;/strong&gt; implements the grammar of graphics to map attributes from a data set to plot features through &lt;em&gt;aesthetics&lt;/em&gt;. This framework can be used to adjust the point &lt;code&gt;size&lt;/code&gt;, &lt;code&gt;color&lt;/code&gt; and transparency &lt;code&gt;alpha&lt;/code&gt; of points in a scatter plot.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Add additional plotting dimensions through aesthetics&lt;/li&gt;
&lt;li&gt;Adjust the point size of a scatter plot using the &lt;code&gt;size&lt;/code&gt; parameter&lt;/li&gt;
&lt;li&gt;Change the point color of a scatter plot using the &lt;code&gt;color&lt;/code&gt; parameter&lt;/li&gt;
&lt;li&gt;Set a parameter &lt;code&gt;alpha&lt;/code&gt; to change the transparency of all points&lt;/li&gt;
&lt;li&gt;Differentiate between aesthetic mappings and constant parameters&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;ggplot(___) + 
  geom_point(
    mapping = aes(x = ___, y = ___, 
                  color = ___, 
                  size  = ___),
    alpha  = ___
  )&lt;/pre&gt;
&lt;h2&gt;Adding more plot aesthetics&lt;/h2&gt;
&lt;p&gt;In their most basic form scatter plots can only visualize datasets in two dimensions through the &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt; aesthetics of the &lt;code&gt;geom_point()&lt;/code&gt; layer. However, most data sets have more than two variables and thus might require additional plotting dimensions. &lt;code&gt;ggplot()&lt;/code&gt; makes it very easy to map additional variables to different plotting aesthetics like &lt;code&gt;size&lt;/code&gt;, transparency &lt;code&gt;alpha&lt;/code&gt; and &lt;code&gt;color&lt;/code&gt;. &lt;/p&gt;
&lt;p&gt;Let’s consider the &lt;code&gt;gapminder_2007&lt;/code&gt; dataset which contains the variables GDP per capita &lt;code&gt;gdpPercap&lt;/code&gt; and life expectancy &lt;code&gt;lifeExp&lt;/code&gt; for 142 countries in the year 2007:&lt;/p&gt;
&lt;pre&gt;ggplot(gapminder_2007) + 
  geom_point(aes(x = gdpPercap, y = lifeExp))&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics_files/figure-html/unnamed-chunk-3-1.png"&gt;
&lt;p&gt;Mapping the &lt;code&gt;continent&lt;/code&gt; variable through the point &lt;code&gt;color&lt;/code&gt; aesthetic and the population &lt;code&gt;pop&lt;/code&gt; (in millions) through the point &lt;code&gt;size&lt;/code&gt; we obtain a much richer plot including 4 different variables from the data set:&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics_files/figure-html/unnamed-chunk-4-1.png"&gt;
&lt;h2&gt;Quiz: geom_point() Aesthetics&lt;/h2&gt;
Which aesthetics can be specified for &lt;code&gt;geom_point()&lt;/code&gt;?
&lt;ul&gt;&lt;li&gt;&lt;code&gt;geom_line&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;color&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;point&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;alpha&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;size&lt;/code&gt;&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Adjusting point color&lt;/h2&gt;
&lt;pre&gt;ggplot(___) + 
  geom_point(
    mapping = aes(x = ___, y = ___, 
                  color = ___, 
                  size  = ___),
    alpha  = ___
  )&lt;/pre&gt;
&lt;p&gt;Typically, the point color is used to introduce a new dimension to a scatter plot. In ggplot we use the &lt;code&gt;color&lt;/code&gt; aesthetic to specify the mapping of a variable to the color of the points.&lt;/p&gt;
&lt;p&gt;For the &lt;code&gt;gapminder_2007&lt;/code&gt; dataset we can plot the GDP per capita &lt;code&gt;gdpPercap&lt;/code&gt; vs. the life expectancy &lt;code&gt;lifeExp&lt;/code&gt; as follows:&lt;/p&gt;
&lt;pre&gt;ggplot(gapminder_2007) + 
  geom_point(aes(x = gdpPercap, y = lifeExp))&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics_files/figure-html/unnamed-chunk-6-1.png"&gt;
&lt;p&gt;To color each point based on the &lt;code&gt;continent&lt;/code&gt; of each country we can use:&lt;/p&gt;
&lt;pre&gt;ggplot(gapminder_2007) + 
  geom_point(aes(x = gdpPercap, y = lifeExp,
                 color = continent))&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics_files/figure-html/unnamed-chunk-7-1.png"&gt;
&lt;p&gt;We see that in the resulting plot each point is colored differently based on the &lt;code&gt;continent&lt;/code&gt; of each country. &lt;code&gt;ggplot&lt;/code&gt; uses the coloring scheme based on the categorical data type of the variable &lt;code&gt;continent&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;By contrast, let’s see how the plot looks like if we color the points by the &lt;code&gt;numeric&lt;/code&gt; variable population &lt;code&gt;pop&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;ggplot(gapminder_2007) + 
  geom_point(aes(x = gdpPercap, y = lifeExp,
                 color = pop))&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics_files/figure-html/unnamed-chunk-8-1.png"&gt;
&lt;p&gt;The scale immediately changes to continuous as it can be seen in the legend and the light-blue points are now the countries with the highest population number (China and India).&lt;/p&gt;
&lt;h2&gt;Exercise: Reconstruct Gapminder graph&lt;/h2&gt;
&lt;p&gt;Reconstruct the following graph which shows the relationship between GDP per capita and life expectancy for the year 2007:&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics_files/figure-html/unnamed-chunk-9-1.png"&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the &lt;code&gt;ggplot()&lt;/code&gt; function and specify the &lt;code&gt;gapminder_2007&lt;/code&gt; dataset as input&lt;/li&gt;
&lt;li&gt;Add a &lt;code&gt;geom_point&lt;/code&gt; layer to the plot and create a scatter plot showing the GDP per capita &lt;code&gt;gdpPercap&lt;/code&gt; on the x-axis and the life expectancy &lt;code&gt;lifeExp&lt;/code&gt; on the y-axis&lt;/li&gt;
&lt;li&gt;Make the &lt;code&gt;color&lt;/code&gt; aesthetic of the points unique for each &lt;code&gt;continent&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics/exercise-03-02"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Exercise: Create a colored scatter plot with DavisClean&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;DavisClean&lt;/code&gt; dataset contains the height and weight measurements of 199 people.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the &lt;code&gt;ggplot()&lt;/code&gt; function and specify the &lt;code&gt;DavisClean&lt;/code&gt; dataset as input&lt;/li&gt;
&lt;li&gt;Add a &lt;code&gt;geom_point()&lt;/code&gt; layer to the plot and create a scatter plot showing the &lt;code&gt;weight&lt;/code&gt; on the x- and the &lt;code&gt;height&lt;/code&gt; on the y-axis&lt;/li&gt;
&lt;li&gt;Make the &lt;code&gt;color&lt;/code&gt; aesthetic of the points unique by the &lt;code&gt;sex&lt;/code&gt; of each individual.&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics/exercise-03-06"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Adjusting point size&lt;/h2&gt;
&lt;pre&gt;ggplot(___) + 
  geom_point(
    mapping = aes(x = ___, y = ___, 
                  color = ___, 
                  size  = ___),
    alpha  = ___
  )&lt;/pre&gt;
&lt;p&gt;For the &lt;code&gt;gapminder_2007&lt;/code&gt; dataset we can plot the GDP per capita &lt;code&gt;gdpPercap&lt;/code&gt; vs. the life expectancy as follows:&lt;/p&gt;
&lt;pre&gt;ggplot(gapminder_2007) + 
  geom_point(aes(x = gdpPercap, y = lifeExp))&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics_files/figure-html/unnamed-chunk-12-1.png"&gt;
&lt;p&gt;To adjust the point size based on the population (&lt;code&gt;pop&lt;/code&gt;) of each country we can use:&lt;/p&gt;
&lt;pre&gt;ggplot(gapminder_2007) + 
  geom_point(aes(x = gdpPercap, y = lifeExp,
                 size = pop))&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics_files/figure-html/unnamed-chunk-13-1.png"&gt;
&lt;p&gt;We see that the point sizes in the plot above do not clearly reflect the population differences in each country. If we compare the point size representing a population of 250 million people with the one displaying 750 million, we can see, that their sizes are not proportional. Instead, the point sizes are &lt;em&gt;binned&lt;/em&gt; by default. To reflect the actual population differences by the point size we can use the &lt;code&gt;scale_size_area()&lt;/code&gt; function instead. The scaling information can be added like any other ggplot object with the &lt;code&gt;+&lt;/code&gt; operator:&lt;/p&gt;
&lt;pre&gt;ggplot(gapminder_2007) + 
  geom_point(aes(x = gdpPercap, y = lifeExp,
                 size = pop)) + 
  scale_size_area(max_size = 10)&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics_files/figure-html/unnamed-chunk-14-1.png"&gt;
&lt;p&gt;Note that we have adjusted the point’s &lt;code&gt;max_size&lt;/code&gt; which results in bigger point sizes.&lt;/p&gt;
&lt;h2&gt;Exercise: Create a Gapminder scatter plot using size&lt;/h2&gt;
&lt;p&gt;Create a scatter plot with &lt;strong&gt;ggplot2&lt;/strong&gt; which shows the relationship between GDP per capita and life expectancy for the year 2007 using the &lt;code&gt;gapminder_2007&lt;/code&gt; dataset.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the &lt;code&gt;ggplot()&lt;/code&gt; function and specify the &lt;code&gt;gapminder_2007&lt;/code&gt; dataset as input&lt;/li&gt;
&lt;li&gt;Add a &lt;code&gt;geom_point()&lt;/code&gt; layer to the plot and create a scatter plot showing the GDP per capita &lt;code&gt;gdpPercap&lt;/code&gt; on the x-axis and the life expectancy &lt;code&gt;lifeExp&lt;/code&gt; on the y-axis&lt;/li&gt;
&lt;li&gt;Use the &lt;code&gt;size&lt;/code&gt; aesthetic to adjust the point size by the population &lt;code&gt;pop&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Use the &lt;code&gt;scale_size_area()&lt;/code&gt; function so that the point sizes reflect actual population differences and set the &lt;code&gt;max_size&lt;/code&gt; of each point to &lt;code&gt;10&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics/exercise-03-03"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Setting global aesthetics: transparency&lt;/h2&gt;
&lt;pre&gt;ggplot(___) + 
  geom_point(
    mapping = aes(x = ___, y = ___, 
                  color = ___, 
                  size  = ___),
    alpha  = ___
  )&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics_files/figure-html/unnamed-chunk-16-1.png"&gt;
&lt;p&gt;Plotting many points with similar x- and y-coordinates in one graph can produce dense point clouds. Many points in these clouds are over plotted and the true number of observations in a certain area is not visible any more. As a solution, we can set the transparency of each point using the ggplot parameter &lt;code&gt;alpha&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Since we do &lt;strong&gt;not&lt;/strong&gt; want to set the point transparency &lt;strong&gt;individually&lt;/strong&gt; for each point but &lt;strong&gt;globally&lt;/strong&gt; for all points we do not set the &lt;code&gt;alpha&lt;/code&gt; parameter as an aesthetic mapping (within &lt;code&gt;aes()&lt;/code&gt;) but outside.&lt;/p&gt;
&lt;p&gt;We set the &lt;strong&gt;opacity&lt;/strong&gt; of each point to 50% through the parameter &lt;code&gt;alpha&lt;/code&gt; &lt;strong&gt;outside&lt;/strong&gt; as a constant parameter:&lt;/p&gt;
&lt;pre&gt;ggplot(gapminder_2007) + 
  geom_point(aes(x = gdpPercap, y = lifeExp, size = pop), 
             alpha = 0.5)&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics_files/figure-html/unnamed-chunk-17-1.png"&gt;
&lt;p&gt;We can now clearly see how many points are overlapping each other and the opacity of each point is set to &lt;code&gt;0.5&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;Quiz: Gapminder Plot&lt;/h2&gt;
&lt;pre&gt;ggplot(gapminder_2007) + 
  geom_point(aes(x = gdpPercap, y = lifeExp, size = pop, 
                 alpha = 0.5, 
                 color = "red"))&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics_files/figure-html/unnamed-chunk-18-1.png"&gt;
Which statements about the plot above are correct?
&lt;ul&gt;&lt;li&gt;Constant plot parameters should be set outside of an aesthetic mapping &lt;code&gt;aes()&lt;/code&gt;.&lt;/li&gt;&lt;li&gt;The reason for the legend entries &lt;code&gt;alpha&lt;/code&gt; and &lt;code&gt;color&lt;/code&gt; are that they are set as aesthetic mappings instead of global parameters.&lt;/li&gt;&lt;li&gt;The parameter &lt;code&gt;lifeExp&lt;/code&gt; should be set as a global parameter.&lt;/li&gt;&lt;li&gt;The parameter &lt;code&gt;gdpPercap&lt;/code&gt; should be set as a global parameter.&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics/quiz-2"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Exercise: Reproduce Gapminder scatter plot&lt;/h2&gt;
&lt;p&gt;Try to reproduce the following plot:&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics_files/figure-html/unnamed-chunk-19-1.png"&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the &lt;code&gt;ggplot()&lt;/code&gt; function and specify the &lt;code&gt;gapminder_2007&lt;/code&gt; dataset as input&lt;/li&gt;
&lt;li&gt;Add a &lt;code&gt;geom_point&lt;/code&gt; layer to the plot and create a scatter plot showing the GDP per capita &lt;code&gt;gdpPercap&lt;/code&gt; on the x-axis and the life expectancy &lt;code&gt;lifeExp&lt;/code&gt; on the y-axis&lt;/li&gt;
&lt;li&gt;Use the &lt;code&gt;color&lt;/code&gt; aesthetic to indicate each &lt;code&gt;continent&lt;/code&gt; by a different color&lt;/li&gt;
&lt;li&gt;Use the &lt;code&gt;size&lt;/code&gt; aesthetic to adjust the point size by the population &lt;code&gt;pop&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;scale_size_area()&lt;/code&gt; so that the point sizes reflect the actual population differences and set the &lt;code&gt;max_size&lt;/code&gt; of each point to &lt;code&gt;15&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Set the opacity/transparency of each point to 70% using the &lt;code&gt;alpha&lt;/code&gt; parameter&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/03-scatterplot-additional-aesthetics/exercise-03-05"&gt;Start Exercise&lt;/a&gt;
&lt;p&gt;Specify additional aesthetics for points is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Create a scatter plot with ggplot</title><link>https://www.quantargo.com/courses/course-r-introduction/04-ggplot/02-scatterplot</link><pubDate>Wed, 22 Jul 2020 07:09:41 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/04-ggplot/02-scatterplot</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/02-scatterplot.png"&gt;
&lt;p&gt;Make your first steps with the &lt;strong&gt;ggplot2&lt;/strong&gt; package to create a scatter plot. Use the grammar-of-graphics to map data set attributes to your plot and connect different layers using the &lt;code&gt;+&lt;/code&gt; operator.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Define a dataset for the plot using the &lt;code&gt;ggplot()&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;Specify a geometric layer using the &lt;code&gt;geom_point()&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;Map attributes from the dataset to plotting properties using the &lt;code&gt;mapping&lt;/code&gt; parameter&lt;/li&gt;
&lt;li&gt;Connect different &lt;code&gt;ggplot&lt;/code&gt; objects using the &lt;code&gt;+&lt;/code&gt; operator&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;library(ggplot2)
ggplot(___) + 
  geom_point(
    mapping = aes(x = ___, y = ___)
  )&lt;/pre&gt;
&lt;h2&gt;Introduction to scatter plots&lt;/h2&gt;
&lt;p&gt;Scatter plots use points to visualize the relationship between two numeric variables. The position of each point represents the value of the variables on the x- and y-axis. Let’s see an example of a scatter plot to understand the relationship between the &lt;em&gt;speed&lt;/em&gt; and the &lt;em&gt;stopping distance&lt;/em&gt; of cars:&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/02-scatterplot_files/figure-html/scatterplot-1.png"&gt;
&lt;p&gt;Each point represents a car. Each car starts to break at a speed given on the y-axis and travels the distance shown on the x-axis until full stop. If we take a look at all points in the plot, we can clearly see that it takes faster cars a longer distance until they are completely stopped.&lt;/p&gt;
&lt;h2&gt;Quiz: Scatter Plot Facts&lt;/h2&gt;
Which of the following statements about scatter plots are correct?
&lt;ul&gt;&lt;li&gt;Scatter plots visualize the relation of two numeric variables&lt;/li&gt;&lt;li&gt;In a scatter plot we only interpret single points and never the relationship between the variables in general&lt;/li&gt;&lt;li&gt;Scatter plots use points to visualize observations&lt;/li&gt;&lt;li&gt;Scatter plots visualize the relation of categorical and numeric variables&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/02-scatterplot/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Specifying a dataset&lt;/h2&gt;
&lt;pre&gt;library(ggplot2)
ggplot(___) + 
  geom_point(
    mapping = aes(x = ___, y = ___)
  )&lt;/pre&gt;
&lt;p&gt;To create plots with &lt;strong&gt;ggplot2&lt;/strong&gt; you first need to load the package using &lt;code&gt;library(ggplot2)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;After the package has been loaded specify the dataset to be used as an argument of the &lt;code&gt;ggplot()&lt;/code&gt; function. For example, to specify a plot using the &lt;code&gt;cars&lt;/code&gt; dataset you can use:&lt;/p&gt;
&lt;pre&gt;library(ggplot2)
ggplot(cars)&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/02-scatterplot_files/figure-html/unnamed-chunk-3-1.png"&gt;
&lt;p&gt;Note that this command does not plot anything but a grey canvas yet. It just defines the dataset for the plot and creates an empty base on top of which we can add additional layers.&lt;/p&gt;
&lt;h2&gt;Exercise: Specify the gapminder dataset&lt;/h2&gt;
&lt;p&gt;To start with a ggplot visualizing the &lt;code&gt;gapminder&lt;/code&gt; dataset we need to:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Load the &lt;strong&gt;ggplot2&lt;/strong&gt; package&lt;/li&gt;
&lt;li&gt;Load the &lt;strong&gt;gapminder&lt;/strong&gt; package&lt;/li&gt;
&lt;li&gt;Define the &lt;code&gt;gapminder&lt;/code&gt; dataset to be used in the plot with the &lt;code&gt;ggplot()&lt;/code&gt; function&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/02-scatterplot/exercise-02-02"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Specifying a geometric layer&lt;/h2&gt;
&lt;pre&gt;library(ggplot2)
ggplot(___) + 
  geom_point(
    mapping = aes(x = ___, y = ___)
  )&lt;/pre&gt;
&lt;p&gt;We can use &lt;strong&gt;ggplot&lt;/strong&gt;’s geometric layers (or &lt;em&gt;geoms&lt;/em&gt;) to define how we want to visualize our dataset. &lt;em&gt;Geoms&lt;/em&gt; use geometric objects to visualize the variables of a dataset. The objects can have multiple forms like points, lines and bars and are specified through the corresponding functions &lt;code&gt;geom_point()&lt;/code&gt;, &lt;code&gt;geom_line()&lt;/code&gt; and &lt;code&gt;geom_col()&lt;/code&gt;:&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/02-scatterplot_files/figure-html/geom_point-1.png"&gt;
&lt;h2&gt;Quiz: Scatter Plot Layers&lt;/h2&gt;
Which geometric layer should be used to create scatter plots in &lt;strong&gt;ggplot2&lt;/strong&gt;?
&lt;ul&gt;&lt;li&gt;&lt;code&gt;point_geom()&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;geom()&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;geom_scatter()&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;geom_point()&lt;/code&gt;&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/02-scatterplot/quiz-2"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Creating aesthetic mappings&lt;/h2&gt;
&lt;pre&gt;library(ggplot2)
ggplot(___) + 
  geom_point(
    mapping = aes(x = ___, y = ___)
  )&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;ggplot2&lt;/strong&gt; uses the concept of &lt;em&gt;aesthetics&lt;/em&gt;, which &lt;em&gt;map&lt;/em&gt; dataset attributes to the visual features of the plot. Each geometric layer requires a different set of &lt;em&gt;aesthetic mappings&lt;/em&gt;, e.g. the &lt;code&gt;geom_point()&lt;/code&gt; function uses the aesthetics &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt; to determine the x- and y-axis coordinates of the points to plot. The aesthetics are mapped within the &lt;code&gt;aes()&lt;/code&gt; function to construct the final mappings.&lt;/p&gt;
&lt;p&gt;To specify a layer of points which plots the variable &lt;code&gt;speed&lt;/code&gt; on the x-axis and distance &lt;code&gt;dist&lt;/code&gt; on the y-axis we can write:&lt;/p&gt;
&lt;pre&gt;geom_point(
  mapping = aes(x=speed, y=dist)
)&lt;/pre&gt;
&lt;p&gt;The expression above constructs a geometric layer. However, this layer is currently not linked to a dataset and does not produce a plot. To &lt;strong&gt;link&lt;/strong&gt; the layer with a &lt;code&gt;ggplot&lt;/code&gt; object specifying the &lt;code&gt;cars&lt;/code&gt; dataset we need to connect the &lt;code&gt;ggplot(cars)&lt;/code&gt; object with the &lt;code&gt;geom_point()&lt;/code&gt; layer using the &lt;code&gt;+&lt;/code&gt; operator:&lt;/p&gt;
&lt;pre&gt;ggplot(cars) + 
  geom_point(
    mapping = aes(x=speed, y=dist)
  )&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/02-scatterplot_files/figure-html/unnamed-chunk-7-1.png"&gt;
&lt;p&gt;Through the linking &lt;code&gt;ggplot()&lt;/code&gt; knows that the mapped &lt;code&gt;speed&lt;/code&gt; and &lt;code&gt;dist&lt;/code&gt; variables are taken from the &lt;code&gt;cars&lt;/code&gt; dataset. &lt;code&gt;geom_point()&lt;/code&gt; instructs ggplot to plot the mapped variables as points.&lt;/p&gt;
&lt;p&gt;The required steps to create a scatter plot with &lt;code&gt;ggplot&lt;/code&gt; can be summarized as follows:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Load the package &lt;strong&gt;ggplot2&lt;/strong&gt; using &lt;code&gt;library(ggplot2)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Specify the dataset to be plotted using &lt;code&gt;ggplot()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Use the &lt;code&gt;+&lt;/code&gt; operator to add layers to the plot.&lt;/li&gt;
&lt;li&gt;Add a geometric layer to define the shapes to be plotted. In case of scatter plots, use &lt;code&gt;geom_point()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Map variables from the dataset to plotting properties through the &lt;code&gt;mapping&lt;/code&gt; parameter in the geometric layer.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Exercise: Visualize the “cars” dataset&lt;/h2&gt;
&lt;p&gt;Create a scatter plot using &lt;code&gt;ggplot()&lt;/code&gt; and visualize the &lt;code&gt;cars&lt;/code&gt; dataset with the car’s stopping distance &lt;code&gt;dist&lt;/code&gt; on the x-axis and the &lt;code&gt;speed&lt;/code&gt; of the car on the y-axis.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;ggplot2&lt;/strong&gt; package is already loaded. Follow these steps to create the plot:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Specify the dataset through the &lt;code&gt;ggplot()&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;Specify a geometric point layer with the &lt;code&gt;geom_point()&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;Map the &lt;code&gt;speed&lt;/code&gt; to the x-axis and the &lt;code&gt;dist&lt;/code&gt; to the y-axis with &lt;code&gt;aes()&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/02-scatterplot/exercise-02-03"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Exercise: Visualize the Gapminder dataset&lt;/h2&gt;
&lt;p&gt;Create a scatter plot using &lt;code&gt;ggplot()&lt;/code&gt; and visualize the &lt;code&gt;gapminder_2007&lt;/code&gt; dataset with the GDP per capita &lt;code&gt;gdpPercap&lt;/code&gt; on the x-axis and the life expectancy &lt;code&gt;lifeExp&lt;/code&gt; of each country on the y-axis.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;ggplot2&lt;/strong&gt; package is already loaded. Follow these steps to create the plot:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Specify the &lt;code&gt;gapminder_2007&lt;/code&gt; dataset through the &lt;code&gt;ggplot()&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;Specify a geometric point layer with &lt;code&gt;geom_point()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Map the &lt;code&gt;gdpPercap&lt;/code&gt; to the x-axis and the &lt;code&gt;lifeExp&lt;/code&gt; to the y-axis with &lt;code&gt;aes()&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/02-scatterplot/exercise-02-04"&gt;Start Exercise&lt;/a&gt;
&lt;p&gt;Create a scatter plot with ggplot is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Why data visualization is important</title><link>https://www.quantargo.com/courses/course-r-introduction/04-ggplot/01-introduction-data-visualization</link><pubDate>Wed, 15 Jul 2020 13:24:16 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/04-ggplot/01-introduction-data-visualization</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/01-introduction-data-visualization.png"&gt;
&lt;p&gt;Data visualization is not only important to communicate results but also a powerful technique for exploratory data analysis. Each plot type like scatter plots, line graphs, bar charts and histograms has its own purpose and can be leveraged in a powerful way using the &lt;strong&gt;ggplot2&lt;/strong&gt; package.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Understand the different roles of data visualization&lt;/li&gt;
&lt;li&gt;Understand the different plot types available&lt;/li&gt;
&lt;li&gt;Get an overview of the &lt;strong&gt;ggplot2&lt;/strong&gt; package.&lt;/li&gt;
&lt;/ul&gt;

&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/01-introduction-data-visualization_files/figure-html/geom_point-1.png"&gt;
&lt;h2&gt;Introduction to data visualization&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A picture is worth a thousand words.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Data visualization is the quickest and most powerful technique to understand new and existing information. During an initial exploration phase data scientists try to reveal the underlying features of a dataset like different distributions, correlations or other visible patterns. This process is also called &lt;em&gt;exploratory data analysis&lt;/em&gt; (EDA) and marks the starting point of each data science project.&lt;/p&gt;
&lt;p&gt;The graphs produced during the EDA show the data scientist the directions of the journey ahead. Revealed patterns can inspire hypothesis about the underlying processes, features of the dataset to be extracted or modelling techniques to be tested. Last but not least, visualizations uncover outliers and data errors which the data scientist needs to take care about.&lt;/p&gt;
&lt;p&gt;The biggest role for data visualization is the communication of data science findings to colleagues and customers through presentations, reports or dashboards. Effort used for EDA and visualizations is time well spent since results can be directly used to communicate findings.&lt;/p&gt;
&lt;h2&gt;Quiz: Visualization Phase&lt;/h2&gt;
For which phases is data visualization important in the data science workflow?
&lt;ul&gt;&lt;li&gt;Explorative Data Analysis (EDA).&lt;/li&gt;&lt;li&gt;Detection of outliers.&lt;/li&gt;&lt;li&gt;Communication of Results.&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/01-introduction-data-visualization/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Available Plot Types&lt;/h2&gt;
&lt;p&gt;There are many plot types available which help to understand different features and relationships in the dataset.&lt;/p&gt;
&lt;p&gt;During the exploratory data analysis phase we typically want to detect the most obvious patterns by looking at each variable in isolation or by detecting relationships of variables against others. The used plot type is also determined by the data type of the input variables like numeric or categorical.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Scatter Plots&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Scatter plots are used to visualize the relationship between two numeric variables. The position of each point represents the value of the variables on the x and y-axis.&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/01-introduction-data-visualization_files/figure-html/scatterplot-1.png"&gt;
&lt;p&gt;&lt;strong&gt;Line Graphs&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Line graphs are used to visualize the trajectory of one numeric variable against another which are connected through lines. They are well suited if values only change &lt;strong&gt;continuously&lt;/strong&gt; - like temperature over time.&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/01-introduction-data-visualization_files/figure-html/unnamed-chunk-2-1.png"&gt;
&lt;p&gt;&lt;strong&gt;Bar Charts and Histograms&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Bar charts visualize &lt;code&gt;numeric&lt;/code&gt; values grouped by categories. Each category is represented by one bar with a height defined by each &lt;code&gt;numeric&lt;/code&gt; value. Histograms are specific bar charts to summarize the number of occurrences of numeric values over a set of value ranges (or bins). They are typically used to determine the &lt;em&gt;distribution&lt;/em&gt; of numeric values.&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/01-introduction-data-visualization_files/figure-html/unnamed-chunk-3-1.png"&gt;
&lt;p&gt;&lt;strong&gt;Others&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Other frequently used plot types in data science include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Box plots&lt;/strong&gt;: Show distributional information of numeric values grouped in categories as boxes. Great to quickly compare multiple distributions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Violin plots&lt;/strong&gt;: Same as box plots but show distributions as &lt;em&gt;violins&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Heat Maps&lt;/strong&gt;: Show interactions of variables - typically correlations - as rastered image highlighting areas of high interaction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network Graphs&lt;/strong&gt;: Show connections between nodes&lt;/li&gt;
&lt;/ul&gt;

&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/01-introduction-data-visualization_files/figure-html/unnamed-chunk-4-1.png"&gt;
&lt;h2&gt;Quiz: Distribution Comparison Plots&lt;/h2&gt;
Which plot types are typically used to compare distributions of numeric variables?
&lt;ul&gt;&lt;li&gt;Box plots&lt;/li&gt;&lt;li&gt;Network graphs&lt;/li&gt;&lt;li&gt;Violin plots&lt;/li&gt;&lt;li&gt;Line Graphs&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/01-introduction-data-visualization/quiz-2"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Introducing: ggplot2&lt;/h2&gt;
&lt;p&gt;Due to the importance of visualization for data science and statistics, R offers a rich set of tools and packages. The core R language already provides a rich set of plotting functions and plot types. These plotting functions require users to specify how to plot each element on the canvas step by step. By contrast, the &lt;strong&gt;ggplot2&lt;/strong&gt; package allows the specification of plots through set of plotting &lt;em&gt;layers&lt;/em&gt;. This requires the package to figure out the required steps to take to produce the graph.&lt;/p&gt;
&lt;p&gt;Through the pre-defined set of geometric layers, facets and themes &lt;strong&gt;ggplot2&lt;/strong&gt; enables users to create beautiful graphs in very short time. &lt;strong&gt;ggplot2&lt;/strong&gt; is also the most widely adopted plotting library in the R community.&lt;/p&gt;
&lt;h2&gt;Quiz: ggplot2 Facts&lt;/h2&gt;
Which statements about data visualization and &lt;strong&gt;ggplot2&lt;/strong&gt; are correct?
&lt;ul&gt;&lt;li&gt;&lt;strong&gt;ggplot2&lt;/strong&gt; is the only way to create plots in R.&lt;/li&gt;&lt;li&gt;&lt;strong&gt;ggplot2&lt;/strong&gt; facilitates the creation of good looking graphs quickly.&lt;/li&gt;&lt;li&gt;&lt;strong&gt;ggplot2&lt;/strong&gt; requires users to specify the plotting commands in a step-by-step fashion.&lt;/li&gt;&lt;li&gt;&lt;strong&gt;ggplot2&lt;/strong&gt; enables users to specify plots in a declarative way.&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/01-introduction-data-visualization/quiz-3"&gt;Start Quiz&lt;/a&gt;
&lt;p&gt;Why data visualization is important is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Create a data transformation pipeline</title><link>https://www.quantargo.com/courses/course-r-introduction/03-dplyr/05-data-transformation-pipeline</link><pubDate>Mon, 06 Jul 2020 21:59:52 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/03-dplyr/05-data-transformation-pipeline</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/03-dplyr/05-data-transformation-pipeline.png"&gt;
&lt;p&gt;All data transformation functions in &lt;strong&gt;dplyr&lt;/strong&gt; can be connected through the pipe &lt;code&gt;%&amp;gt;%&lt;/code&gt; operator to create powerful and yet expressive data transformation pipelines.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use the pipe operator &lt;code&gt;%&amp;gt;%&lt;/code&gt; to combine multiple &lt;strong&gt;dplyr&lt;/strong&gt; functions into one pipeline&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;my_data_frame&gt; %&gt;%
  filter(___) %&gt;%
  select(___) %&gt;%
  arrange(___)&lt;/pre&gt;
&lt;h2&gt;Using the %&gt;% operator&lt;/h2&gt;
&lt;p&gt;The pipe operator &lt;code&gt;%&amp;gt;%&lt;/code&gt; is a special part of the &lt;code&gt;tidyverse&lt;/code&gt; universe. It is used to combine multiple functions and run them one after the other. In this setting the input of each function is the output of the previous function. Imagine we have the &lt;code&gt;pres_results&lt;/code&gt; data frame and want to create a smaller, more transparent data frame for answering the question: In which states was the democratic party the most popular choice in the 2016 US presidential election? To accomplish this task we would need to take the following steps:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;
&lt;code&gt;filter()&lt;/code&gt; the data frame for the rows, where the &lt;code&gt;year&lt;/code&gt; variable equals 2016&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;select()&lt;/code&gt; the two variables &lt;code&gt;state&lt;/code&gt; and &lt;code&gt;dem&lt;/code&gt;, since we are not interested in the rest of the columns.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;arrange()&lt;/code&gt; the filtered and selected data frame based on the &lt;code&gt;dem&lt;/code&gt; column in a descending way.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The steps and functions described above should be run one after the other, where the input of each function is the output of the previous step. Applying the things you learned so far, you could accomplish this task by taking the following steps:&lt;/p&gt;
&lt;pre&gt;result &lt;- filter(pres_results, year==2016)
result &lt;- select(result, state, dem)
result &lt;- arrange(result, desc(dem))
result&lt;/pre&gt;
&lt;pre&gt;# A tibble: 51 x 2
  state   dem
  &lt;chr&gt; &lt;dbl&gt;
1 DC    0.905
2 CA    0.617
3 HI    0.610
# … with 48 more rows&lt;/pre&gt;
&lt;p&gt;The first function takes the &lt;code&gt;pres_results&lt;/code&gt; data frame, filters it according to the task description and assigns it to the variable &lt;code&gt;result&lt;/code&gt;. Then, each subsequent function takes the &lt;code&gt;result&lt;/code&gt; variable as input and overwrites it with its own output.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;%&amp;gt;%&lt;/code&gt; operator provides a practical way for combining the steps above into seemingly one step. It takes a data frame as the initial input. Then, it applies a list of functions, and passes on the output of each function for the input for the next function. The same task as above can be accomplished using the pipe operator &lt;code&gt;%&amp;gt;%&lt;/code&gt; like this:&lt;/p&gt;
&lt;pre&gt;pres_results %&gt;%
  filter(year==2016) %&gt;%
  select(state, dem, rep) %&gt;%
  arrange(desc(dem))&lt;/pre&gt;
&lt;pre&gt;# A tibble: 51 x 3
  state   dem    rep
  &lt;chr&gt; &lt;dbl&gt;  &lt;dbl&gt;
1 DC    0.905 0.0407
2 CA    0.617 0.316 
3 HI    0.610 0.294 
# … with 48 more rows&lt;/pre&gt;
&lt;p&gt;We can interpret the code in the following way:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;We define the original data set as a starting point.&lt;/li&gt;
&lt;li&gt;Using the &lt;code&gt;%&amp;gt;%&lt;/code&gt; operator right after the data frame tells &lt;strong&gt;dplyr&lt;/strong&gt;, that a function is coming, which takes the previously defined data frame as input.&lt;/li&gt;
&lt;li&gt;We use each function as usual, but skip the first parameter. The data frame input is automatically provided by the output of the previous step.&lt;/li&gt;
&lt;li&gt;As long as we add the &lt;code&gt;%&amp;gt;%&lt;/code&gt; operator after a step, &lt;strong&gt;dplyr&lt;/strong&gt; will expect an additional step.&lt;/li&gt;
&lt;li&gt;In our example the pipeline closes with a &lt;code&gt;arrange()&lt;/code&gt; function. It gets the filtered and selected version of the &lt;code&gt;pres_results&lt;/code&gt; data frame as input and sorts it based on the &lt;code&gt;dem&lt;/code&gt; column in a descending way. Finally, it gives back the output.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One difference between the two approaches is, that the &lt;code&gt;%&amp;gt;%&lt;/code&gt; operator does not save permanently the intermediate or the final results. To save the resulting data frame we need to assign the output to a variable:&lt;/p&gt;
&lt;pre&gt;result &lt;- pres_results %&lt;&gt;%
  filter(year==2016) %&gt;%
  select(state, dem) %&gt;%
  arrange(desc(dem))

result&lt;/pre&gt;
&lt;pre&gt;# A tibble: 51 x 2
  state   dem
  &lt;chr&gt; &lt;dbl&gt;
1 DC    0.905
2 CA    0.617
3 HI    0.610
# … with 48 more rows&lt;/pre&gt;
&lt;h2&gt;Exercise: Austrian Life Expectancy&lt;/h2&gt;
&lt;p&gt;Use the &lt;code&gt;%&amp;gt;%&lt;/code&gt; operator on the &lt;code&gt;gapminder&lt;/code&gt; data set and create a simple data frame to answer the following question: How did the life expectancy in Austria change over the last decades? Required packages are already loaded.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Define the &lt;code&gt;gapminder&lt;/code&gt; data frame as the base data frame&lt;/li&gt;
&lt;li&gt;Filter only the rows where the &lt;code&gt;country&lt;/code&gt; column is equal to &lt;code&gt;Austria&lt;/code&gt; by piping &lt;code&gt;gapminder&lt;/code&gt; to the &lt;code&gt;filter()&lt;/code&gt; function.&lt;/li&gt;
&lt;li&gt;Select only the columns: &lt;code&gt;year&lt;/code&gt; and &lt;code&gt;lifeExp&lt;/code&gt; from the filtered result.&lt;/li&gt;
&lt;li&gt;Arrange the results based on the &lt;code&gt;year&lt;/code&gt; column based on the selected columns.&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/05-data-transformation-pipeline/exercise-03-05-00"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Exercise: European GDP Per Capita&lt;/h2&gt;
&lt;p&gt;Use the &lt;code&gt;%&amp;gt;%&lt;/code&gt; operator on the &lt;code&gt;gapminder&lt;/code&gt; dataset and create a simple tibble to answer the following question: Which European country had the highest GDP per capita in 2007? Required packages are already loaded.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Define the &lt;code&gt;gapminder&lt;/code&gt; tibble as the input&lt;/li&gt;
&lt;li&gt;Filter only the rows where the &lt;code&gt;year&lt;/code&gt; column is equal to &lt;code&gt;2007&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Use a second layer of filter and keep only the rows where the &lt;code&gt;continent&lt;/code&gt; column is equal to &lt;code&gt;Europe&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Select only the columns: &lt;code&gt;country&lt;/code&gt; and &lt;code&gt;gdpPercap&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Arrange the results based on the &lt;code&gt;gdpPercap&lt;/code&gt; column in a descending way&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/05-data-transformation-pipeline/exercise-03-05-01"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Exercise: Americas Population&lt;/h2&gt;
&lt;p&gt;Use the &lt;code&gt;%&amp;gt;%&lt;/code&gt; operator on the &lt;code&gt;gapminder&lt;/code&gt; dataset and create a simple tibble to answer the following question: Which country on the continent &lt;code&gt;Americas&lt;/code&gt; had the largest population in 2007?&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Define the &lt;code&gt;gapminder&lt;/code&gt; tibble as the input&lt;/li&gt;
&lt;li&gt;Filter only the rows where the &lt;code&gt;year&lt;/code&gt; column is equal to &lt;code&gt;2007&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Use a second layer of filter and keep only the rows where the &lt;code&gt;continent&lt;/code&gt; column is equal to &lt;code&gt;Americas&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Select only the columns: &lt;code&gt;country&lt;/code&gt; and &lt;code&gt;pop&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Arrange the results based on the &lt;code&gt;pop&lt;/code&gt; column in a descending way&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/05-data-transformation-pipeline/exercise-03-05-02"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Quiz: Malformed Code&lt;/h2&gt;
&lt;pre&gt;gapminder %&gt;%
  filter(year == 2007, continent == "Americas") %&gt;%
  select(gapminder, country, pop) %&gt;%
  arrange(desc(pop)) %&gt;%&lt;/pre&gt;
Take a look at the code above. What mistakes does it contain?
&lt;ul&gt;&lt;li&gt;The &lt;code&gt;gapminder&lt;/code&gt; tibble should not be defined in the &lt;code&gt;select()&lt;/code&gt; function.&lt;/li&gt;&lt;li&gt;There should be no &lt;code&gt;%&amp;gt;%&lt;/code&gt; applied after the last line.&lt;/li&gt;&lt;li&gt;There will be no output, because you cannot use these functions in this order.&lt;/li&gt;&lt;li&gt;The &lt;code&gt;desc()&lt;/code&gt; function should be applied on the whole &lt;code&gt;arrange()&lt;/code&gt; function and not on a single column.&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/05-data-transformation-pipeline/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;p&gt;Create a data transformation pipeline is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Sort data frames by columns</title><link>https://www.quantargo.com/courses/course-r-introduction/03-dplyr/04-sort-data-frame-by-columns</link><pubDate>Thu, 02 Jul 2020 10:48:36 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/03-dplyr/04-sort-data-frame-by-columns</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/03-dplyr/04-sort-data-frame-by-columns.png"&gt;
&lt;p&gt;To select areas of interest in a data frame they often need to be ordered by specific columns. The &lt;strong&gt;dplyr&lt;/strong&gt; &lt;code&gt;arrange()&lt;/code&gt; function supports data frame orderings by multiple columns in ascending and descending order.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use the &lt;code&gt;arrange()&lt;/code&gt; function to sort data frames.&lt;/li&gt;
&lt;li&gt;Sort data frames by multiple columns using &lt;code&gt;arrange()&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;arrange(&lt;my_data_frame&gt;, &lt;column_one&gt;)
arrange(&lt;my_data_frame&gt;, &lt;column_one&gt;, &lt;column_two&gt;, ...)&lt;/pre&gt;
&lt;h2&gt;The arrange() function with a single column&lt;/h2&gt;
&lt;pre&gt;arrange(&lt;my_data_frame&gt;, &lt;column_one&gt;)
arrange(&lt;my_data_frame&gt;, &lt;column_one&gt;, &lt;column_two&gt;, ...)&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;arrange()&lt;/code&gt; function orders the rows of a data frame. It takes a data frame or a tibble as the first parameter and the names of the columns based on which the rows should be ordered as additional parameters. Let’s assume, we want to answer the question: &lt;em&gt;Which states had the highest percentage of Republican voters in the 2016 US presidential election?&lt;/em&gt; To answer this question, in the following example we use the &lt;code&gt;pres_results_2016&lt;/code&gt; data frame, containing information only for the 2016 US presidential election. We &lt;code&gt;arrange()&lt;/code&gt; the data frame based on the &lt;code&gt;rep&lt;/code&gt; column (Republican votes in percentage):&lt;/p&gt;
&lt;pre&gt;arrange(pres_results_2016, rep)&lt;/pre&gt;
&lt;pre&gt;# A tibble: 51 x 6
   year state total_votes   dem    rep  other
  &lt;dbl&gt; &lt;chr&gt;       &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;
1  2016 DC         312575 0.905 0.0407 0.0335
2  2016 HI         437664 0.610 0.294  0.0958
3  2016 VT         320467 0.557 0.298  0.0737
# … with 48 more rows&lt;/pre&gt;
&lt;p&gt;As you can see in the output, the data frame is sorted in an ascending order based on the &lt;code&gt;rep&lt;/code&gt; column. However, we would prefer to have the results in a descending order, so that we can instantly see the &lt;code&gt;state&lt;/code&gt; with the highest &lt;code&gt;rep&lt;/code&gt; percentage. To sort a column in a descending order, all we need to do is apply the &lt;code&gt;desc()&lt;/code&gt; function on the given column inside the &lt;code&gt;arrange()&lt;/code&gt; function:&lt;/p&gt;
&lt;pre&gt;arrange(pres_results_2016, desc(rep))&lt;/pre&gt;
&lt;pre&gt;# A tibble: 51 x 6
   year state total_votes   dem   rep  other
  &lt;dbl&gt; &lt;chr&gt;       &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt;
1  2016 WV         713051 0.265 0.686 0.0489
2  2016 WY         258788 0.216 0.674 0.0830
3  2016 OK        1452992 0.289 0.653 0.0575
# … with 48 more rows&lt;/pre&gt;
&lt;p&gt;Arranging is not only possible on numeric values, but on character values as well. In that case, &lt;strong&gt;dplyr&lt;/strong&gt; sorts the rows in alphabetic order. We can arrange character columns just like numeric ones:&lt;/p&gt;
&lt;pre&gt;arrange(pres_results_2016, state)&lt;/pre&gt;
&lt;pre&gt;# A tibble: 51 x 6
   year state total_votes   dem   rep  other
  &lt;dbl&gt; &lt;chr&gt;       &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt;
1  2016 AK         318608 0.366 0.513 0.0928
2  2016 AL        2123372 0.344 0.621 0.0254
3  2016 AR        1130635 0.337 0.606 0.0577
# … with 48 more rows&lt;/pre&gt;
&lt;h2&gt;Exercise: Use arrange() based on a single column&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;gapminder_2007&lt;/code&gt; dataset contains economic and demographic data about various countries for the year 2007. Arrange the tibble and inspect which country had the lowest life expectancy &lt;code&gt;lifeExp&lt;/code&gt; in 2007! The &lt;strong&gt;dplyr&lt;/strong&gt; package is already loaded.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Apply the &lt;code&gt;arrange()&lt;/code&gt; function on the &lt;code&gt;gapminder_2007&lt;/code&gt; tibble&lt;/li&gt;
&lt;li&gt;Order the tibble based on the &lt;code&gt;lifeExp&lt;/code&gt; column&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/04-sort-data-frame-by-columns/exercise-03-04-00"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Exercise: Use arrange() in combination with desc()&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;gapminder_2007&lt;/code&gt; dataset contains economic and demographic data about various countries for the year 2007. Arrange the tibble and inspect which countries had the largest population in 2007! The &lt;strong&gt;dplyr&lt;/strong&gt; package is already loaded.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Apply the &lt;code&gt;arrange()&lt;/code&gt; function on the &lt;code&gt;gapminder_2007&lt;/code&gt; tibble.&lt;/li&gt;
&lt;li&gt;Sort the tibble in a descending order based on the &lt;code&gt;pop&lt;/code&gt; column.&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/04-sort-data-frame-by-columns/exercise-03-04-01"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;The arrange() function with multiple columns&lt;/h2&gt;
&lt;p&gt;We can use the &lt;code&gt;arrange()&lt;/code&gt; function on multiple columns as well. In this case the order of the columns in the function parameters, sets a hierarchy of ordering. The function starts by ordering the rows based on the first column defined in the parameters. In case there are several rows with the same value, the function decides the order based on the second column defined in the parameters. If there are still multiple rows with the same values, the function decides based on the third column defined in the parameters (if defined) and so on.&lt;/p&gt;
&lt;p&gt;In the following example we use the &lt;code&gt;pres_results_subset&lt;/code&gt; data frame, containing election results only for the states: &lt;code&gt;"TX"&lt;/code&gt;(Texas),&lt;code&gt;"UT"&lt;/code&gt;(Utah) and &lt;code&gt;"FL"&lt;/code&gt;(Florida). First we sort the data frame in a descending order based on the &lt;code&gt;year&lt;/code&gt; column. Then, we add a second level, and order the data frame based on the &lt;code&gt;dem&lt;/code&gt; column:&lt;/p&gt;
&lt;pre&gt;arrange(pres_results_subset, year, dem)&lt;/pre&gt;
&lt;pre&gt;# A tibble: 33 x 6
   year state total_votes   dem   rep   other
  &lt;dbl&gt; &lt;chr&gt;       &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;   &lt;dbl&gt;
1  1976 UT         541218 0.336 0.624 0.0392 
2  1976 TX        4071884 0.511 0.480 0.00817
3  1976 FL        3150631 0.519 0.466 0.0143 
# … with 30 more rows&lt;/pre&gt;
&lt;p&gt;As you can see in the output, the data frame is overall ordered based on the &lt;code&gt;year&lt;/code&gt; column. However, when the value of &lt;code&gt;year&lt;/code&gt; is the same, the order of the rows is decided by the &lt;code&gt;dem&lt;/code&gt; column.&lt;/p&gt;
&lt;h2&gt;Exercise: Use arrange() based on multiple columns&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;gapminder_2007&lt;/code&gt; tibble contains economic and demographic data about various countries for the year 2007. Arrange the tibble and inspect for each continent, which countries had the highest life expectancy in 2007! The &lt;strong&gt;dplyr&lt;/strong&gt; package is already loaded.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Apply the &lt;code&gt;arrange()&lt;/code&gt; function on the &lt;code&gt;gapminder_2007&lt;/code&gt; tibble.&lt;/li&gt;
&lt;li&gt;Order the tibble based on the &lt;code&gt;continent&lt;/code&gt; column!&lt;/li&gt;
&lt;li&gt;In case there are rows with the same &lt;code&gt;continent&lt;/code&gt;, sort the tibble in a descending order based on the &lt;code&gt;lifeExp&lt;/code&gt; column!&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/04-sort-data-frame-by-columns/exercise-03-04-02"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Quiz: arrange() Function&lt;/h2&gt;
Which of the following statements are true about the &lt;code&gt;arrange()&lt;/code&gt; function?
&lt;ul&gt;&lt;li&gt;The &lt;code&gt;arrange()&lt;/code&gt; function orders the rows of a data frame.&lt;/li&gt;&lt;li&gt;To &lt;code&gt;arrange()&lt;/code&gt; the values of column in an ascending order, we need to use the &lt;code&gt;asc()&lt;/code&gt; function.&lt;/li&gt;&lt;li&gt;To &lt;code&gt;arrange()&lt;/code&gt; the values of column in a descending order, we need to use the &lt;code&gt;desc()&lt;/code&gt; function.&lt;/li&gt;&lt;li&gt;You can only &lt;code&gt;arrange()&lt;/code&gt; a data frame based on one column.&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/04-sort-data-frame-by-columns/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;p&gt;Sort data frames by columns is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Filter data frame rows</title><link>https://www.quantargo.com/courses/course-r-introduction/03-dplyr/03-filter-data-frame-rows</link><pubDate>Fri, 26 Jun 2020 18:56:10 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/03-dplyr/03-filter-data-frame-rows</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/03-dplyr/03-filter-data-frame-rows.png"&gt;
&lt;p&gt;We often want to operate only on a specific subset of rows of a data frame. The &lt;strong&gt;dplyr&lt;/strong&gt; &lt;code&gt;filter()&lt;/code&gt; function provides a flexible way to extract the rows of interest based on multiple conditions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use the &lt;code&gt;filter()&lt;/code&gt; function to sort out the rows of a data frame that fulfill a specified condition&lt;/li&gt;
&lt;li&gt;Filter a data frame by multiple conditions&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;filter(my_data_frame, condition)
filter(my_data_frame, condition_one, condition_two, ...)&lt;/pre&gt;
&lt;h2&gt;The filter() function&lt;/h2&gt;
&lt;pre&gt;filter(my_data_frame, condition)
filter(my_data_frame, condition_one, condition_two, ...)&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;filter()&lt;/code&gt; function takes a data frame and one or more filtering expressions as input parameters. It processes the data frame and keeps only the rows that fulfill the defined filtering expressions. These expressions can be seen as rules for the evaluation and keeping of rows. In the majority of the cases, they are based on relational operators. As an example, we could filter the &lt;code&gt;pres_results&lt;/code&gt; data frame and keep only the rows, where the &lt;code&gt;state&lt;/code&gt; variable is equal to &lt;code&gt;"CA"&lt;/code&gt; (California):&lt;/p&gt;
&lt;pre&gt;filter(pres_results, state == "CA")&lt;/pre&gt;
&lt;pre&gt;# A tibble: 11 x 6
    year state total_votes   dem   rep  other
   &lt;dbl&gt; &lt;chr&gt;       &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt;
 1  1976 CA        7803770 0.480 0.497 0.0230
 2  1980 CA        8582938 0.359 0.527 0.114 
 3  1984 CA        9505041 0.413 0.575 0.0122
 4  1988 CA        9887065 0.476 0.511 0.0131
 5  1992 CA       11131721 0.460 0.326 0.213 
 6  1996 CA       10019469 0.511 0.382 0.107 
 7  2000 CA       10965822 0.534 0.417 0.0490
 8  2004 CA       12421353 0.543 0.444 0.0117
 9  2008 CA       13561900 0.610 0.370 0.0188
10  2012 CA       13038547 0.602 0.371 0.0246
11  2016 CA       14181595 0.617 0.316 0.0581&lt;/pre&gt;
&lt;p&gt;In the output, we can compare the election results in California for different years.&lt;/p&gt;
&lt;p&gt;As another example, we could filter the &lt;code&gt;pres_results&lt;/code&gt; data frame and keep only those rows, where the &lt;code&gt;dem&lt;/code&gt; variable (percentage of votes for the Democratic Party) is greater than 0.85:&lt;/p&gt;
&lt;pre&gt;filter(pres_results, dem &gt; 0.85)&lt;/pre&gt;
&lt;pre&gt;# A tibble: 7 x 6
   year state total_votes   dem    rep   other
  &lt;dbl&gt; &lt;chr&gt;       &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt;   &lt;dbl&gt;
1  1984 DC         211288 0.854 0.137  0.00886
2  1996 DC         185726 0.852 0.0934 0.0513 
3  2000 DC         201894 0.852 0.0895 0.0563 
4  2004 DC         227586 0.892 0.0934 0.0125 
5  2008 DC         265853 0.925 0.0653 0.00582
6  2012 DC         293764 0.909 0.0728 0.0155 
7  2016 DC         312575 0.905 0.0407 0.0335 &lt;/pre&gt;
&lt;p&gt;In the output we can see for each election year the states where the Democratic Party got over 85% of the votes. Based on the results, we could say that the Democratic Party has a solid voter base in the District of Columbia (known as Washington, D.C.).&lt;/p&gt;
&lt;h2&gt;Exercise: Use filter() with a single expression&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;gapminder&lt;/code&gt; dataset contains economic and demographic data about various countries since 1952.&lt;/p&gt;
&lt;p&gt;Inspect the data for a single year by using the &lt;code&gt;filter()&lt;/code&gt; function.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Apply the &lt;code&gt;filter()&lt;/code&gt; function on the &lt;code&gt;gapminder&lt;/code&gt; dataset&lt;/li&gt;
&lt;li&gt;Keep only the rows where the &lt;code&gt;year&lt;/code&gt; is equal to 2007&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Note that the &lt;strong&gt;dplyr&lt;/strong&gt; and &lt;strong&gt;gapminder&lt;/strong&gt; packages are already loaded.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/03-filter-data-frame-rows/exercise-03-03-00"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Quiz: filter() Function&lt;/h2&gt;
Which of the following statements about the &lt;code&gt;filter()&lt;/code&gt; function are correct?
&lt;ul&gt;&lt;li&gt;Relational operators, such as &lt;code&gt;==&lt;/code&gt; or &lt;code&gt;&amp;gt;&lt;/code&gt;, are frequently part of the filtering expressions.&lt;/li&gt;&lt;li&gt;The &lt;code&gt;filter()&lt;/code&gt; function comes in the &lt;strong&gt;dplyr&lt;/strong&gt; package.&lt;/li&gt;&lt;li&gt;Only numeric variables can be filtered.&lt;/li&gt;&lt;li&gt;The &lt;code&gt;filter()&lt;/code&gt; function works only on data frames, not on tibbles.&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/03-filter-data-frame-rows/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Multiple filter expressions&lt;/h2&gt;
&lt;pre&gt;filter(my_data_frame, condition)
filter(my_data_frame, condition_one, condition_two, ...)&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;filter()&lt;/code&gt; function can take multiple filtering rules as input as well. These can be seen as a combination of rules with the &lt;code&gt;&amp;amp;&lt;/code&gt; operator. In order for a row to be included in the output, all filtering rules must be fulfilled by it. In the following example, we filter the &lt;code&gt;pres_results&lt;/code&gt; data frame for all rows where the &lt;code&gt;state&lt;/code&gt; variable is equal to &lt;code&gt;"CA"&lt;/code&gt; and the &lt;code&gt;year&lt;/code&gt; variable is equal to 2016:&lt;/p&gt;
&lt;pre&gt;filter(pres_results, state == "CA", year==2016)&lt;/pre&gt;
&lt;pre&gt;# A tibble: 1 x 6
   year state total_votes   dem   rep  other
  &lt;dbl&gt; &lt;chr&gt;       &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt;
1  2016 CA       14181595 0.617 0.316 0.0581&lt;/pre&gt;
&lt;p&gt;We get a single row as output, containing the 2016 US presidential election results for California state.&lt;/p&gt;
&lt;h2&gt;Exercise: Use filter() with multiple rules&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;gapminder&lt;/code&gt; dataset contains economic and demographic data about various countries since 1952. Filter the tibble and inspect which countries had a life expectancy over 80 years in the year 2007! The required packages are already loaded.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the &lt;code&gt;filter()&lt;/code&gt; function on the gapminder tibble.&lt;/li&gt;
&lt;li&gt;Filter all rows where the &lt;code&gt;year&lt;/code&gt; variable is equal to 2007 and the life expectancy &lt;code&gt;lifeExp&lt;/code&gt; is greater than 80!&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/03-filter-data-frame-rows/exercise-03-03-02"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Exercise&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;gapminder&lt;/code&gt; dataset contains economic and demographic data about various countries since 1952. Filter the &lt;code&gt;gapminder&lt;/code&gt; tibble and inspect which countries had a population of over 1.000.000.000 in the year 2007! The required packages are already loaded.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the &lt;code&gt;filter()&lt;/code&gt; function on the gapminder tibble.&lt;/li&gt;
&lt;li&gt;Filter all rows where the &lt;code&gt;year&lt;/code&gt; variable is equal to 2007 and the population &lt;code&gt;pop&lt;/code&gt; is greater than 1000000000!&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/03-filter-data-frame-rows/exercise-03-03-03"&gt;Start Exercise&lt;/a&gt;
&lt;p&gt;Filter data frame rows is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Select columns from a data frame</title><link>https://www.quantargo.com/courses/course-r-introduction/03-dplyr/02-select-columns-data-frame</link><pubDate>Fri, 19 Jun 2020 17:02:06 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/03-dplyr/02-select-columns-data-frame</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/03-dplyr/02-select-columns-data-frame.png"&gt;
&lt;p&gt;To select only a specific set of interesting data frame columns &lt;strong&gt;dplyr&lt;/strong&gt; offers the &lt;code&gt;select()&lt;/code&gt; function to extract columns by names, indices and ranges. You can even rename extracted columns with &lt;code&gt;select()&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Learn to use the &lt;code&gt;select()&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;Select columns from a data frame by name or index&lt;/li&gt;
&lt;li&gt;Rename columns from a data frame&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;select(my_data_frame, column_one, column_two, ...)
select(my_data_frame, new_column_name = current_column, ...)
select(my_data_frame, column_start:column_end)
select(my_data_frame, index_one, index_two, ...)
select(my_data_frame, index_start:index_end)&lt;/pre&gt;
&lt;h2&gt;Selecting by name&lt;/h2&gt;
&lt;pre&gt;select(my_data_frame, column_one, column_two, ...)
select(my_data_frame, new_column_name = current_column, ...)
select(my_data_frame, column_start:column_end)
select(my_data_frame, index_one, index_two, ...)
select(my_data_frame, index_start:index_end)&lt;/pre&gt;
&lt;p&gt;In this chapter we will have a look at the &lt;code&gt;pres_results&lt;/code&gt; dataset from the &lt;strong&gt;politicaldata&lt;/strong&gt; package. It contains data about US presidential elections since 1976, converted to a Tibble for nicer printing.&lt;/p&gt;
&lt;pre&gt;# A tibble: 561 x 6
   year state total_votes   dem   rep   other
  &lt;dbl&gt; &lt;chr&gt;       &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;   &lt;dbl&gt;
1  1976 AK         123574 0.357 0.579 0.0549 
2  1976 AL        1182850 0.557 0.426 0.0163 
3  1976 AR         767535 0.650 0.349 0.00134
# … with 558 more rows&lt;/pre&gt;
&lt;p&gt;For this example, we will have a look at the number of total votes in different states at different elections. Since we are only interested in the number of people who voted we would like to create a custom version of the &lt;code&gt;pres_results&lt;/code&gt; data frame that only contains the columns &lt;code&gt;year&lt;/code&gt;, &lt;code&gt;state&lt;/code&gt; and &lt;code&gt;total_votes&lt;/code&gt;. For such filtering, we can use the &lt;code&gt;select()&lt;/code&gt; fiction from the &lt;strong&gt;dplyr&lt;/strong&gt; package.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;select()&lt;/code&gt; function takes a data frame as an input parameter and lets us decide which of the columns we want to keep from it. The output of the function is a data frame with all rows, but containing only the columns we explicitly select.&lt;/p&gt;
&lt;p&gt;We can reduce our dataset to only &lt;code&gt;year&lt;/code&gt;, &lt;code&gt;state&lt;/code&gt; and &lt;code&gt;total_votes&lt;/code&gt; in the following way:&lt;/p&gt;
&lt;pre&gt;select(pres_results, year, state, total_votes)&lt;/pre&gt;
&lt;pre&gt;# A tibble: 561 x 3
   year state total_votes
  &lt;dbl&gt; &lt;chr&gt;       &lt;dbl&gt;
1  1976 AK         123574
2  1976 AL        1182850
3  1976 AR         767535
# … with 558 more rows&lt;/pre&gt;
&lt;p&gt;As the first parameter we passed the &lt;code&gt;pres_results&lt;/code&gt; data frame, as the remaining parameters we passed the columns we want to keep to &lt;code&gt;select()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Apart from keeping the columns we want, the &lt;code&gt;select()&lt;/code&gt; function also keeps them in the same order as we specified in the function parameters.&lt;/p&gt;
&lt;p&gt;If we change the order of the parameters when we call the function, the columns of the output change accordingly:&lt;/p&gt;
&lt;pre&gt;select(pres_results, total_votes, year, state)&lt;/pre&gt;
&lt;pre&gt;# A tibble: 561 x 3
  total_votes  year state
        &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;
1      123574  1976 AK   
2     1182850  1976 AL   
3      767535  1976 AR   
# … with 558 more rows&lt;/pre&gt;
&lt;h2&gt;Exercise: Life expectancy in Austria&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;gapminder_austria&lt;/code&gt; dataset contains information about the economic and demographic change in Austria over the last decades. To inspect how the life expectancy in Austria changed over time, create a subset of the tibble that contains only the necessary columns for this task:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the &lt;strong&gt;dplyr&lt;/strong&gt; &lt;code&gt;select()&lt;/code&gt; function and define &lt;code&gt;gapminder_austria&lt;/code&gt; as the input tibble.&lt;/li&gt;
&lt;li&gt;Keep only the columns &lt;code&gt;year&lt;/code&gt; and &lt;code&gt;lifeExp&lt;/code&gt; in the output dataset.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Note that the &lt;strong&gt;dplyr&lt;/strong&gt; package is already loaded.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/02-select-columns-data-frame/exercise-03-02-00"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Renaming columns&lt;/h2&gt;
&lt;pre&gt;select(my_data_frame, column_one, column_two, ...)
select(my_data_frame, new_column_name = current_column, ...)
select(my_data_frame, column_start:column_end)
select(my_data_frame, index_one, index_two, ...)
select(my_data_frame, index_start:index_end)&lt;/pre&gt;
&lt;p&gt;In addition to defining the columns we want keep, we can also rename them. To do this, we need to set the new column name inside the &lt;code&gt;select()&lt;/code&gt; function using the command&lt;/p&gt;
&lt;pre&gt;new_column_name = current_column&lt;/pre&gt;
&lt;p&gt;In the following example, we select the columns &lt;code&gt;year&lt;/code&gt;, &lt;code&gt;state&lt;/code&gt; and &lt;code&gt;total_votes&lt;/code&gt; but rename the &lt;code&gt;year&lt;/code&gt; column to &lt;code&gt;Election&lt;/code&gt; in the output:&lt;/p&gt;
&lt;pre&gt;select(pres_results, Election = year, state, total_votes)&lt;/pre&gt;
&lt;pre&gt;# A tibble: 561 x 3
  Election state total_votes
     &lt;dbl&gt; &lt;chr&gt;       &lt;dbl&gt;
1     1976 AK         123574
2     1976 AL        1182850
3     1976 AR         767535
# … with 558 more rows&lt;/pre&gt;
&lt;h2&gt;Exercise: Rename columns&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;gapminder_india&lt;/code&gt; dataset contains information about the economic and demographic change in India over the last decades. Inspect how the population in India changed over time:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the &lt;code&gt;select()&lt;/code&gt; function and define &lt;code&gt;gapminder_india&lt;/code&gt; as the input tibble.&lt;/li&gt;
&lt;li&gt;Keep only the columns &lt;code&gt;year&lt;/code&gt; and &lt;code&gt;pop&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Rename the &lt;code&gt;pop&lt;/code&gt; column to &lt;code&gt;population&lt;/code&gt; in the output tibble.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Note that the &lt;strong&gt;dplyr&lt;/strong&gt; package is already loaded.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/02-select-columns-data-frame/exercise-03-02-01"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Selecting by name range&lt;/h2&gt;
&lt;pre&gt;select(my_data_frame, column_one, column_two, ...)
select(my_data_frame, new_column_name = current_column, ...)
select(my_data_frame, column_start:column_end)
select(my_data_frame, index_one, index_two, ...)
select(my_data_frame, index_start:index_end)&lt;/pre&gt;
&lt;p&gt;When we use the &lt;code&gt;select()&lt;/code&gt; function and define the columns we want to keep, &lt;strong&gt;dplyr&lt;/strong&gt; does not actually use the name of the columns but the index of the columns in the data frame. This means, when we define the first three columns of the &lt;code&gt;pres_results&lt;/code&gt; data frame, &lt;code&gt;year&lt;/code&gt;, &lt;code&gt;state&lt;/code&gt; and &lt;code&gt;total_votes&lt;/code&gt;, &lt;strong&gt;dplyr&lt;/strong&gt; converts these names to the index values &lt;code&gt;1&lt;/code&gt;, &lt;code&gt;2&lt;/code&gt; and &lt;code&gt;3&lt;/code&gt;. We can therefore also use the name of the columns, apply the &lt;code&gt;:&lt;/code&gt; operator and define ranges of columns, that we want to keep:&lt;/p&gt;
&lt;pre&gt;select(pres_results, year:total_votes)&lt;/pre&gt;
&lt;pre&gt;# A tibble: 561 x 3
   year state total_votes
  &lt;dbl&gt; &lt;chr&gt;       &lt;dbl&gt;
1  1976 AK         123574
2  1976 AL        1182850
3  1976 AR         767535
# … with 558 more rows&lt;/pre&gt;
&lt;p&gt;What the &lt;code&gt;year:total_votes&lt;/code&gt; does, can be translated to &lt;code&gt;1:3&lt;/code&gt;, which is simply creating a vector of numerical values from 1 to 3. Then, the &lt;code&gt;select()&lt;/code&gt; function takes the &lt;code&gt;pres_results&lt;/code&gt; data frame and outputs a subset of it, keeping only the first three columns.&lt;/p&gt;
&lt;h2&gt;Exercise: Select a name range&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;gapminder_europe_2007&lt;/code&gt; dataset contains economic and demographic information about European countries for the year 2007:&lt;/p&gt;
&lt;pre&gt;# A tibble: 30 x 6
  country continent  year lifeExp      pop gdpPercap
  &lt;fct&gt;   &lt;fct&gt;     &lt;int&gt;   &lt;dbl&gt;    &lt;int&gt;     &lt;dbl&gt;
1 Albania Europe     2007    76.4  3600523     5937.
2 Austria Europe     2007    79.8  8199783    36126.
3 Belgium Europe     2007    79.4 10392226    33693.
# … with 27 more rows&lt;/pre&gt;
&lt;p&gt;Create a subset of the tibble and compare the life expectancy in different European countries for the year 2007:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Apply the &lt;code&gt;select()&lt;/code&gt; function on the &lt;code&gt;gapminder_europe_2007&lt;/code&gt; tibble.&lt;/li&gt;
&lt;li&gt;Use the &lt;code&gt;:&lt;/code&gt; operator and select the columns from &lt;code&gt;country&lt;/code&gt; to &lt;code&gt;lifeExp&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Note that the &lt;strong&gt;dplyr&lt;/strong&gt; package is already loaded.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/02-select-columns-data-frame/exercise-03-02-02"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Select() by indices&lt;/h2&gt;
&lt;pre&gt;select(my_data_frame, column_one, column_two, ...)
select(my_data_frame, new_column_name = current_column, ...)
select(my_data_frame, column_start:column_end)
select(my_data_frame, index_one, index_two, ...)
select(my_data_frame, index_start:index_end)&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;select()&lt;/code&gt; function can be used with column indices as well. Instead of using names we need to specify the columns we want to select by their indices. Compared to other programming languages the indexing in R starts with &lt;em&gt;one&lt;/em&gt; instead of &lt;em&gt;zero&lt;/em&gt;. To select the first, fourth and fifth column from the &lt;code&gt;pres_results&lt;/code&gt; dataset we can write&lt;/p&gt;
&lt;pre&gt;select(pres_results, 1,4,5)&lt;/pre&gt;
&lt;pre&gt;# A tibble: 561 x 3
   year   dem   rep
  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1  1976 0.357 0.579
2  1976 0.557 0.426
3  1976 0.650 0.349
# … with 558 more rows&lt;/pre&gt;
&lt;p&gt;Similarly to defining ranges of columns using their names, we can define ranges (or vectors) of index values instead:&lt;/p&gt;
&lt;pre&gt;select(pres_results, 1:3)&lt;/pre&gt;
&lt;pre&gt;# A tibble: 561 x 3
   year state total_votes
  &lt;dbl&gt; &lt;chr&gt;       &lt;dbl&gt;
1  1976 AK         123574
2  1976 AL        1182850
3  1976 AR         767535
# … with 558 more rows&lt;/pre&gt;
&lt;h2&gt;Exercise: Select by indices&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;gapminder_europe_2007&lt;/code&gt; dataset contains economic and demographic information about European countries for the year 2007.&lt;/p&gt;
&lt;pre&gt;# A tibble: 30 x 6
  country continent  year lifeExp      pop gdpPercap
  &lt;fct&gt;   &lt;fct&gt;     &lt;int&gt;   &lt;dbl&gt;    &lt;int&gt;     &lt;dbl&gt;
1 Albania Europe     2007    76.4  3600523     5937.
2 Austria Europe     2007    79.8  8199783    36126.
3 Belgium Europe     2007    79.4 10392226    33693.
# … with 27 more rows&lt;/pre&gt;
&lt;p&gt;Create a subset of the dataset and compare the GDP per capita of the European countries for the year 2007:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Apply the &lt;code&gt;select()&lt;/code&gt; function on the &lt;code&gt;gapminder_europe_2007&lt;/code&gt; tibble.&lt;/li&gt;
&lt;li&gt;Keep the columns &lt;code&gt;country&lt;/code&gt; and &lt;code&gt;gdpPercap&lt;/code&gt;, but use only the index of the columns (&lt;code&gt;1&lt;/code&gt;and &lt;code&gt;6&lt;/code&gt;) for this step.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Note that the &lt;strong&gt;dplyr&lt;/strong&gt; package is already loaded.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/02-select-columns-data-frame/exercise-03-02-03"&gt;Start Exercise&lt;/a&gt;
&lt;p&gt;Select columns from a data frame is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Introduction to dplyr</title><link>https://www.quantargo.com/courses/course-r-introduction/03-dplyr/01-introduction-to-dplyr</link><pubDate>Tue, 16 Jun 2020 06:12:20 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/03-dplyr/01-introduction-to-dplyr</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/03-dplyr/01-introduction-to-dplyr.png"&gt;
&lt;p&gt;&lt;strong&gt;dplyr&lt;/strong&gt; facilitates the data transformation process by providing a rich framework to manipulate data frames. &lt;strong&gt;dplyr&lt;/strong&gt; functions can be concatenated to powerful transformation pipelines to select, filter, sort, join and aggregate data.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Learn what &lt;strong&gt;dplyr&lt;/strong&gt; does&lt;/li&gt;
&lt;li&gt;Get an overview of Select, Filter and Sort&lt;/li&gt;
&lt;li&gt;Learn what Joins, Aggregations and Pipelines are&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;What is dplyr&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;There’s the joke that 80 percent of data science is cleaning the data and 20 percent is complaining about cleaning the data.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Anthony Goldbloom, Founder and CEO of Kaggle&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Having &lt;em&gt;clean&lt;/em&gt; data in any Data Science project is super important, because the results only get as good as is the data correct. Cleaning data is also the part which usually consumes most of the time and causes the biggest pains for data scientists. R already offers a broad set of tools and functions to manipulate data frames. However, due to its long history, the available base R tool set is fragmented and hard to use for new users.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;dplyr&lt;/strong&gt; package facilitates the data transformation process through a consistent collection of functions. These functions support different transformations on data frames, including&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;filter rows&lt;/li&gt;
&lt;li&gt;select columns&lt;/li&gt;
&lt;li&gt;sort data&lt;/li&gt;
&lt;li&gt;aggregate data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multiple data frames can also be joined together by common attribute values.&lt;/p&gt;
&lt;p&gt;The consistency of &lt;strong&gt;dplyr&lt;/strong&gt; functions improves usability and enables user to connect transformations together to form &lt;em&gt;data pipelines&lt;/em&gt;. These pipelines can also be seen as a high-level query language—much like e.g. the SQL language for database queries. Additionally, it is even possible to translate created data pipelines to other back-ends including databases.&lt;/p&gt;
&lt;h2&gt;Quiz: dplyr Facts&lt;/h2&gt;
Which of the below statements are correct?
&lt;ul&gt;&lt;li&gt;&lt;strong&gt;dplyr&lt;/strong&gt; provides a consistent set of functions for data visualization&lt;/li&gt;&lt;li&gt;&lt;strong&gt;dplyr&lt;/strong&gt; functions can be connected to data pipelines&lt;/li&gt;&lt;li&gt;&lt;strong&gt;dplyr&lt;/strong&gt; queries can be translated to database queries&lt;/li&gt;&lt;li&gt;&lt;strong&gt;dplyr&lt;/strong&gt; supports data transformations like aggregations and joins&lt;/li&gt;&lt;li&gt;&lt;strong&gt;dplyr&lt;/strong&gt; is built for vector transformations&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/01-introduction-to-dplyr/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Function Framework&lt;/h2&gt;
&lt;p&gt;Every data transformation function in &lt;strong&gt;dplyr&lt;/strong&gt; accepts a data frame as its first input parameter and returns the transformed data frame back as an output. A blueprint for a typical &lt;strong&gt;dplyr&lt;/strong&gt; function looks like this:&lt;/p&gt;
&lt;pre&gt;transformed &lt;- dplyr_function(my_data_frame, 
                              param_one, 
                              param_two, 
                              ...) &lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;dplyr_function&lt;/code&gt; can be customized further through additional arguments (&lt;code&gt;param_one&lt;/code&gt;, &lt;code&gt;param_two&lt;/code&gt;) placed after the first data frame parameter (&lt;code&gt;my_data_frame&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;The real power of &lt;strong&gt;dplyr&lt;/strong&gt; comes with the pipe operator &lt;code&gt;%&amp;gt;%&lt;/code&gt; which allows users to concatenate &lt;strong&gt;dplyr&lt;/strong&gt; functions to data pipelines. The pipe injects the resulting data frame from the previous calculation as the first argument of next one. A data transformation consisting of three functions looks like&lt;/p&gt;
&lt;pre&gt;dplyr_function_three(
  dplyr_function_two(
    dplyr_function_one(my_data_frame)))&lt;/pre&gt;
&lt;p&gt;but can be written with the pipe as&lt;/p&gt;
&lt;pre&gt;my_data_frame %&gt;%
  dplyr_function_one() %&gt;%
  dplyr_function_two() %&gt;%
  dplyr_function_three()&lt;/pre&gt;
&lt;p&gt;The different reading order of data transformation functions in actual transformation order makes pipelines easier to read than nested function calls.&lt;/p&gt;
&lt;h2&gt;Quiz: Valid Functions&lt;/h2&gt;
&lt;code&gt;dplyr_function&lt;/code&gt; specifies the transformation function, &lt;code&gt;param_one&lt;/code&gt; the parameter for the &lt;strong&gt;dplyr&lt;/strong&gt; function and &lt;code&gt;input_data_frame&lt;/code&gt; the data frame to be transformed. Which of the code lines below are valid according to the &lt;strong&gt;dplyr&lt;/strong&gt; function framework?
&lt;ul&gt;&lt;li&gt;&lt;code&gt;dplyr_function(param_one, input_data_frame)&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;dplyr_function(input_data_frame, param_one)&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;input_data_frame(dplyr_function, param_one)&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;param_one(dplyr_function, input_data_frame)&lt;/code&gt;&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/03-dplyr/01-introduction-to-dplyr/quiz-2"&gt;Start Quiz&lt;/a&gt;
&lt;p&gt;Introduction to dplyr is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Select first or last rows of a data frame</title><link>https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/05-select-first-last-rows</link><pubDate>Fri, 12 Jun 2020 07:18:46 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/05-select-first-last-rows</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/02-data-frames-tibbles/05-select-first-last-rows.png"&gt;
&lt;p&gt;We often do not need to look at all the contents of a data frame in the console. Instead, only parts of it are sufficient like the top or bottom retrieved through the &lt;code&gt;head()&lt;/code&gt; and &lt;code&gt;tail()&lt;/code&gt; functions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Select the top of a data frame&lt;/li&gt;
&lt;li&gt;Select the bottom of a data frame&lt;/li&gt;
&lt;li&gt;Specify the number of lines to select through the parameter &lt;code&gt;n&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;head(___, n = ___)
tail(___, n = ___)&lt;/pre&gt;
&lt;h2&gt;Selecting the top of a data frame&lt;/h2&gt;
&lt;pre&gt;head(___, n = ___)
tail(___, n = ___)&lt;/pre&gt;
&lt;p&gt;Data frames can span a large number of rows and columns. Based on the printed output in the console it can be hard to get an initial impression of the data inside the data frame. This issue is not so much of a problem for tibbles which have a nicer console output. Additionally, it can be helpful to easily retrieve the first rows in one command without any indexing or additional packages.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;TitanicSurvival&lt;/code&gt; dataset contains data of 1309 passengers represented as rows. A simple print of the dataset would print all passengers, filling up the entire console. Instead, the &lt;code&gt;head()&lt;/code&gt; function shows only the first 10 rows of a data frame including its column names:&lt;/p&gt;
&lt;pre&gt;head(TitanicSurvival)&lt;/pre&gt;
&lt;pre&gt;                                survived    sex     age
Allen, Miss. Elisabeth Walton        yes female 29.0000
Allison, Master. Hudson Trevor       yes   male  0.9167
Allison, Miss. Helen Loraine          no female  2.0000
Allison, Mr. Hudson Joshua Crei       no   male 30.0000
Allison, Mrs. Hudson J C (Bessi       no female 25.0000
Anderson, Mr. Harry                  yes   male 48.0000
                                passengerClass
Allen, Miss. Elisabeth Walton              1st
Allison, Master. Hudson Trevor             1st
Allison, Miss. Helen Loraine               1st
Allison, Mr. Hudson Joshua Crei            1st
Allison, Mrs. Hudson J C (Bessi            1st
Anderson, Mr. Harry                        1st&lt;/pre&gt;
&lt;p&gt;The number of columns can be tuned using the parameter &lt;code&gt;n&lt;/code&gt;. To extract only the first three rows from the data set you can write:&lt;/p&gt;
&lt;pre&gt;head(TitanicSurvival, n = 3)&lt;/pre&gt;
&lt;pre&gt;                               survived    sex     age
Allen, Miss. Elisabeth Walton       yes female 29.0000
Allison, Master. Hudson Trevor      yes   male  0.9167
Allison, Miss. Helen Loraine         no female  2.0000
                               passengerClass
Allen, Miss. Elisabeth Walton             1st
Allison, Master. Hudson Trevor            1st
Allison, Miss. Helen Loraine              1st&lt;/pre&gt;
&lt;h2&gt;Exercise: Select the top of a data frame&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;salaries_sort&lt;/code&gt; dataset contains the 2008-09 nine-month academic salary for professors from a college in the US. The dataset is sorted by &lt;code&gt;salary&lt;/code&gt; in ascending order.&lt;/p&gt;
&lt;p&gt;Inspect the 10 lowest paid professors by selecting the first 10 rows using the &lt;code&gt;head()&lt;/code&gt; function.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/05-select-first-last-rows/exercise-02-05-01"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Selecting the bottom of a data frame&lt;/h2&gt;
&lt;pre&gt;head(___, n = ___)
tail(___, n = ___)&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;tail()&lt;/code&gt; function can be used to select the bottom rows of a data frame. Similar to the &lt;code&gt;head()&lt;/code&gt; function it also accepts a parameter &lt;code&gt;n&lt;/code&gt; to specify the number rows to be returned.&lt;/p&gt;
&lt;p&gt;For example, to select the last five rows from the &lt;code&gt;TitanicSurvival&lt;/code&gt; dataset you can write:&lt;/p&gt;
&lt;pre&gt;tail(TitanicSurvival, n = 5)&lt;/pre&gt;
&lt;pre&gt;                          survived    sex  age passengerClass
Zabour, Miss. Hileni            no female 14.5            3rd
Zabour, Miss. Thamine           no female   NA            3rd
Zakarian, Mr. Mapriededer       no   male 26.5            3rd
Zakarian, Mr. Ortin             no   male 27.0            3rd
Zimmerman, Mr. Leo              no   male 29.0            3rd&lt;/pre&gt;
&lt;p&gt;The head and tail functions can also be combined to select a fragment of the data set from the middle. To select the first five rows from the bottom 500 rows you can write:&lt;/p&gt;
&lt;pre&gt;head(tail(TitanicSurvival, n = 500), n = 5)&lt;/pre&gt;
&lt;pre&gt;                                survived    sex age passengerClass
Ford, Mr. Edward Watson               no   male  18            3rd
Ford, Mr. William Neal                no   male  16            3rd
Ford, Mrs. Edward (Margaret Ann       no female  48            3rd
Fox, Mr. Patrick                      no   male  NA            3rd
Franklin, Mr. Charles (Charles        no   male  NA            3rd&lt;/pre&gt;
&lt;h2&gt;Exercise: Select the bottom of a data frame&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;salaries_sort&lt;/code&gt; dataset contains the 2008-09 nine-month academic salary for professors from a college in the US. The dataset is sorted by &lt;code&gt;salary&lt;/code&gt; in ascending order.&lt;/p&gt;
&lt;p&gt;Inspect the 20 highest paid professors by selecting the last 20 rows using the &lt;code&gt;tail()&lt;/code&gt; function.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/05-select-first-last-rows/exercise-02-05-02"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Exercise: Select the top from the bottom data frame&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;salaries_sort&lt;/code&gt; dataset contains the 2008-09 nine-month academic salary for 397 Professors from a college in the US. The dataset is sorted by the &lt;code&gt;salary&lt;/code&gt; in ascending order.&lt;/p&gt;
&lt;p&gt;Inspect the 10 professors around the median salary by&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Selecting the bottom 200 professors using the &lt;code&gt;tail()&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;Selecting the top 10 professors out of the bottom 200&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/05-select-first-last-rows/exercise-02-05-03"&gt;Start Exercise&lt;/a&gt;
&lt;p&gt;Select first or last rows of a data frame is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Determine the size of a data frame</title><link>https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/04-determine-size-data-frame</link><pubDate>Tue, 09 Jun 2020 10:26:51 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/04-determine-size-data-frame</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/02-data-frames-tibbles/04-determine-size-data-frame.png"&gt;
&lt;p&gt;The size of a data frame, like the number of rows or columns, is often required and can be determined in various ways.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Get number of rows of a data frame&lt;/li&gt;
&lt;li&gt;Get number of columns of a data frame&lt;/li&gt;
&lt;li&gt;Get dimensions of a data frame&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;nrow(___)
ncol(___)
dim(___)
length(___)&lt;/pre&gt;
&lt;h2&gt;Data Frame Dimensions&lt;/h2&gt;
&lt;pre&gt;nrow(___)
ncol(___)
dim(___)
length(___)&lt;/pre&gt;
&lt;p&gt;The number of rows and columns in a data frame can be guessed through the printed output of the data frame. However, it is much easier to get this information directly through functions. Additionally, you might want to use this information in some parts of the code.&lt;/p&gt;
&lt;p&gt;Data frames have two dimensions. The number of rows is considered to be the first dimension. It typically defines the number of observations in a data set. To get the number of rows from the &lt;code&gt;Davis&lt;/code&gt; data frame in the &lt;strong&gt;carData&lt;/strong&gt; dataset use the &lt;code&gt;nrow()&lt;/code&gt; function:&lt;/p&gt;
&lt;pre&gt;nrow(Davis)&lt;/pre&gt;
&lt;pre&gt;[1] 200&lt;/pre&gt;
&lt;p&gt;Similarly, the number of columns or &lt;em&gt;attributes&lt;/em&gt; of the data frame can be retrieved with &lt;code&gt;ncol()&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;ncol(Davis)&lt;/pre&gt;
&lt;pre&gt;[1] 5&lt;/pre&gt;
&lt;h2&gt;Exercise: Determine number of elements in data frame&lt;/h2&gt;
&lt;pre&gt;                              survived    sex age passengerClass
Allen, Miss. Elisabeth Walton      yes female  29            1st
 [ reached 'max' / getOption("max.print") -- omitted 1308 rows ]&lt;/pre&gt;
&lt;p&gt;Determine the number of data values in the &lt;code&gt;TitanicSurvival&lt;/code&gt; data frame above given as the number of rows multiplied by the number of columns.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/04-determine-size-data-frame/exercise-02-03-01"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Retrieving data frame dimensions&lt;/h2&gt;
&lt;pre&gt;nrow(___)
ncol(___)
dim(___)
length(___)&lt;/pre&gt;
&lt;p&gt;To retrieve the size of all dimensions from a data frame at once you can use the &lt;code&gt;dim()&lt;/code&gt; function. &lt;code&gt;dim()&lt;/code&gt; returns a vector with two elements, the first element is the number of rows and the second element the number of columns.&lt;/p&gt;
&lt;p&gt;For example, the dimensions of the &lt;code&gt;Davis&lt;/code&gt; dataset can be retrieved as&lt;/p&gt;
&lt;pre&gt;dim(Davis)&lt;/pre&gt;
&lt;pre&gt;[1] 200   5&lt;/pre&gt;
&lt;p&gt;In addition to data frames &lt;code&gt;dim()&lt;/code&gt; can also be used for other multi-dimensional R objects such as matrices or arrays. However, when used with vectors &lt;code&gt;dim&lt;/code&gt; only returns &lt;code&gt;NULL&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;dim(c(1, 3, 5, 7))&lt;/pre&gt;
&lt;pre&gt;NULL&lt;/pre&gt;
&lt;p&gt;Instead, the length of a vector is determined through &lt;code&gt;length()&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;length(c(1, 3, 5, 7))&lt;/pre&gt;
&lt;pre&gt;[1] 4&lt;/pre&gt;
&lt;p&gt;In the case of a data frame &lt;code&gt;length()&lt;/code&gt; returns its number of columns:&lt;/p&gt;
&lt;pre&gt;length(Davis)&lt;/pre&gt;
&lt;pre&gt;[1] 5&lt;/pre&gt;
&lt;h2&gt;Quiz: Data Frame Dimensions&lt;/h2&gt;
&lt;pre&gt;dim(Florida)&lt;/pre&gt;
What does the above command return for the data set &lt;code&gt;Florida&lt;/code&gt; from the &lt;strong&gt;carData&lt;/strong&gt; package which has 11 columns and 67 rows?
&lt;ul&gt;&lt;li&gt;&lt;code&gt;67&lt;/code&gt; &lt;code&gt;11&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;11&lt;/code&gt; &lt;code&gt;67&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;11&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;67&lt;/code&gt;&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/04-determine-size-data-frame/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;p&gt;Determine the size of a data frame is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Extract or replace columns in a data frame using `$`</title><link>https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/03-extracting-columns</link><pubDate>Tue, 02 Jun 2020 21:35:52 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/03-extracting-columns</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/02-data-frames-tibbles/03-extracting-columns.png"&gt;
&lt;p&gt;Columns in a data frame can be easily extracted and manipulated with the &lt;code&gt;$&lt;/code&gt; operator. Even new columns can be added by assigning a vector.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Extract columns from a data frame with the &lt;code&gt;$&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Replace values of existing columns in a data frame.&lt;/li&gt;
&lt;li&gt;Add new columns to a data frame.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;___$___
___$___  &lt;- ___&lt;/pre&gt;
&lt;h2&gt;Extract columns with the $&lt;/h2&gt;
&lt;pre&gt;___$___
___$___  &lt;- ___&lt;/pre&gt;
&lt;p&gt;Data frames are tables resulting from the combination of column vectors. Users can interact with data frames through numerous operators to extract, add or recombine values. To extract single columns from a data frame R offers a very specific operator: the dollar &lt;code&gt;$&lt;/code&gt;. It returns the column vector as indicated by its name based on a data frame preceding the &lt;code&gt;$&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To see the &lt;code&gt;$&lt;/code&gt; operator in action let’s extract the population &lt;code&gt;pop&lt;/code&gt; (in 1,000) from different states in the US based on the &lt;code&gt;States&lt;/code&gt; dataset (from 1992) in the &lt;strong&gt;carData&lt;/strong&gt; package:&lt;/p&gt;
&lt;pre&gt;States$pop&lt;/pre&gt;
&lt;pre&gt; [1]  4041   550  3665  2351 29760  3294  3287   666   607 12938
[11]  6478  1108  1007 11431  5544  2777  2478  3685  4220  1228
[21]  4781  6016  9295  4375  2573  5117   799  1578  1202  1109
[31]  7730  1515 17990  6629   639 10847  3146  2842 11882  1003
[41]  3487   696  4877 16987  1723   563  6187  4867  1793  4892
[51]   454&lt;/pre&gt;
&lt;p&gt;The command extracts the population column as vector from the data frame. From this vector we can calculate the &lt;code&gt;sum()&lt;/code&gt; of the total population as:&lt;/p&gt;
&lt;pre&gt;sum(States$pop)&lt;/pre&gt;
&lt;pre&gt;[1] 248709&lt;/pre&gt;
&lt;p&gt;Similarly, the average salary (in $1,000) for teachers can be calculated as the &lt;code&gt;mean()&lt;/code&gt; from the &lt;code&gt;pay&lt;/code&gt; column:&lt;/p&gt;
&lt;pre&gt;mean(States$pay)&lt;/pre&gt;
&lt;pre&gt;[1] 30.94118&lt;/pre&gt;
&lt;h2&gt;Quiz: Extract column from a data frame&lt;/h2&gt;
&lt;pre&gt;      rank discipline yrs.since.phd yrs.service  sex salary
1     Prof          B            19          18 Male 139750
2     Prof          B            20          16 Male 173200
3 AsstProf          B             4           3 Male  79750
4     Prof          B            45          39 Male 115000
5     Prof          B            40          41 Male 141500
 [ reached 'max' / getOption("max.print") -- omitted 392 rows ]&lt;/pre&gt;
Which R command can be used to calculate the average &lt;code&gt;salary&lt;/code&gt; of professors in the &lt;code&gt;Salaries&lt;/code&gt; dataset from the &lt;strong&gt;carData&lt;/strong&gt; package?
&lt;ul&gt;&lt;li&gt;&lt;code&gt;mean(Salaries$salary)&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;mean(salary$Salaries)&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;Salaries(mean$salary)&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;TitanicSurvival(age$mean)&lt;/code&gt;&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/03-extracting-columns/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Exercise: Extract column from a data frame&lt;/h2&gt;
&lt;p&gt;Calculate the average &lt;code&gt;age&lt;/code&gt; of passengers in the &lt;code&gt;TitanicSurvival&lt;/code&gt; dataset from the &lt;strong&gt;carData&lt;/strong&gt; package. The &lt;strong&gt;carData&lt;/strong&gt; package is already loaded.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/03-extracting-columns/exercise-02-03-01"&gt;Start Exercise&lt;/a&gt;
&lt;p&gt;Extract or replace columns in a data frame using `$` is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>QBits Workspace: A New Online Editor to Share and Deploy R Code</title><link>https://www.quantargo.com/blog/post/2020-05-27-introducing-qbits-editor</link><pubDate>Wed, 27 May 2020 00:00:00 +0000</pubDate><guid>https://www.quantargo.com/blog/post/2020-05-27-introducing-qbits-editor</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;QBits Workspace: A New Online Editor to Share and Deploy R Code&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2020-05-27-introducing-qbits-editor/qbit_workspace_demo.gif"&gt;
&lt;p&gt;Today we are excited to announce the &lt;a href="https://www.quantargo.com/qbits"&gt;QBits Workspace&lt;/a&gt; to run and deploy R code in the browser. QBits enable you to run R in a serverless cloud environment and provide an easy and cost-effective way to develop, run, deploy and share data science projects at scale without the need to manage servers, software setup and package installations. They start up instantly, have very quick deployment times and can handle all sorts of data science projects. In fact, QBits already power our &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;online course platform&lt;/a&gt; and even more exciting use cases will follow soon.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why QBits&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We created QBits to make the deployment experience for data scientists easier. Too many projects fail because data scientists struggle to deploy their results. Think of a simple &lt;strong&gt;ggplot2&lt;/strong&gt; example to reproduce the &lt;code&gt;gapminder&lt;/code&gt; plots from &lt;a href="https://www.ted.com/talks/hans_rosling_the_best_stats_you_ve_ever_seen"&gt;Hans Rosling’s excellent presentation&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;library(ggplot2)
library(dplyr)
library(gapminder)

gapminder_2007 &lt;- filter(gapminder, year == 2007)
gapminder_2007$pop &lt;- gapminder_2007$pop/1e6
ggplot(gapminder_2007) + 
  geom_point(aes(x = gdpPercap, y = lifeExp, 
                 color = continent,
                 size = pop),
  alpha = 0.7) + 
  scale_size_area(max_size = 15)&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2020-05-27-introducing-qbits-editor/2020-05-27-introducing-qbits-editor_files/figure-html/unnamed-chunk-1-1.png"&gt;
&lt;p&gt;This plot runs fine locally. However, to reproduce the plot in some interactive web application allowing users to filter the dataset by e.g. &lt;code&gt;year == 1952&lt;/code&gt; we need to&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Create a docker container choosing the &lt;em&gt;right&lt;/em&gt; operating system.&lt;/li&gt;
&lt;li&gt;Install the correct language runtime, e.g. R 4.0.0.&lt;/li&gt;
&lt;li&gt;Install all package dependencies (e.g. &lt;strong&gt;ggplot2&lt;/strong&gt;, &lt;strong&gt;dplyr&lt;/strong&gt;, &lt;strong&gt;gapminder&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;Create a Shiny application or Plumber API for interactive or programmatic use.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You see that even for this simple example the deployment overhead is considerable. This leads to a deployment bottleneck leaving many data science projects unfinished and frustrated data scientists behind. The big difference with QBits is that they already provide the correct container, language runtime and packages. The only thing you have to do is to put your code on top. That’s it.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;QBits Workspace&lt;/strong&gt; provides a development environment to rapidly develop your custom QBits. Since you are already &lt;strong&gt;working within your custom container&lt;/strong&gt; the final deployment is then only a matter of a second—not weeks.&lt;/p&gt;
&lt;p&gt;Check out the previous example &lt;a href="https://www.quantargo.com/qbits/qbit-course-r-introduction%2304-ggplot%2303-scatterplot-additional-aesthetics%23exercise-03-05"&gt;&lt;em&gt;Reproduce Gapminder scatter plot&lt;/em&gt; within the QBits Workspace here&lt;/a&gt;.&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2020-05-27-introducing-qbits-editor/qbit_workspace_demo.gif"&gt;
&lt;div id="whats-next" class="section level3"&gt;
&lt;h3&gt;What’s Next&lt;/h3&gt;
&lt;p&gt;We are hard at work to expand the editor to fit more workflows and implement new features. Further updates will introduce the possibility to&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Create your own QBits&lt;/li&gt;
&lt;li&gt;Add and remove packages (all 15,000+ CRAN packages are available)&lt;/li&gt;
&lt;li&gt;QBit deployment including versioning&lt;/li&gt;
&lt;li&gt;… and more (yes, Python is coming as well)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For now, &lt;a href="https://www.quantargo.com/qbits#showcase"&gt;head over to our playgrounds&lt;/a&gt; and give them a try.&lt;/p&gt;
&lt;p&gt;We would love to hear your feedback and feature requests:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Write us at &lt;a href="mailto:support@quantargo.com"&gt;support@quantargo.com&lt;/a&gt; or&lt;/li&gt;
&lt;li&gt;Hit us up on &lt;a href="https://twitter.com/quantargo"&gt;Twitter&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Cheers,&lt;/p&gt;
&lt;p&gt;Your Quantargo Team&lt;/p&gt;
&lt;/div&gt;</description></item><item><title>Create and convert tibbles</title><link>https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/02-creating-and-converting-tibbles</link><pubDate>Fri, 22 May 2020 18:12:30 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/02-creating-and-converting-tibbles</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/02-data-frames-tibbles/02-creating-and-converting-tibbles.png"&gt;
&lt;p&gt;Tibbles are the modern reimagination of data frames and share many commonalities with their ancestors. The most visible difference is how tibble contents are printed to the console. Tibbles are part of the tidyverse and used for their more consistent behaviour compared to data frames.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Learn the difference between &lt;em&gt;data frames&lt;/em&gt; and &lt;em&gt;tibbles&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Create &lt;em&gt;tibbles&lt;/em&gt; from vectors&lt;/li&gt;
&lt;li&gt;Convert &lt;em&gt;data frames&lt;/em&gt; into &lt;em&gt;tibbles&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;tibble(___ = ___, 
       ___ = ___, 
       ...)
as_tibble(___)&lt;/pre&gt;
&lt;h2&gt;Introduction to Tibbles&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A modern reimagining of the data frame&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://tibble.tidyverse.org" class="uri"&gt;https://tibble.tidyverse.org&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Tibbles are in many ways similar to data frames. In fact, they are &lt;em&gt;inherited&lt;/em&gt; from data frames which means that all functions and features available for data frames also work for tibbles. Therefore, when we speak of &lt;em&gt;data frames&lt;/em&gt; we also mean &lt;em&gt;tibbles&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;In addition to everything a data frame has to offer, tibbles have a more consistent behaviour with better usability in many cases. Most importantly, when a tibble object is printed to the console it automatically shows only the first 10 rows and condenses additional columns. By contrast, a data frame fills up the entire console screen with values which can lead to confusion. Let’s take a look the the &lt;code&gt;gapminder&lt;/code&gt; dataset from the &lt;strong&gt;gapminder&lt;/strong&gt; package:&lt;/p&gt;
&lt;pre&gt;gapminder&lt;/pre&gt;
&lt;pre&gt;# A tibble: 1,704 x 6
   country     continent  year lifeExp      pop gdpPercap
   &lt;fct&gt;       &lt;fct&gt;     &lt;int&gt;   &lt;dbl&gt;    &lt;int&gt;     &lt;dbl&gt;
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# … with 1,694 more rows&lt;/pre&gt;
&lt;p&gt;We immediately see that the &lt;code&gt;gapminder&lt;/code&gt; dataset is a tibble consisting of 1,704 rows and 6 columns on the top line. In the second line we can see the column names and their corresponding &lt;em&gt;data types&lt;/em&gt; directly below.&lt;/p&gt;
&lt;p&gt;For example, the column &lt;code&gt;country&lt;/code&gt; has the type &lt;code&gt;&amp;lt;fct&amp;gt;&lt;/code&gt; (which is short for “factor”), &lt;code&gt;year&lt;/code&gt; is an integer &lt;code&gt;&amp;lt;int&amp;gt;&lt;/code&gt; and life expectancy &lt;code&gt;lifeExp&lt;/code&gt; is a &lt;code&gt;&amp;lt;dbl&amp;gt;&lt;/code&gt;—a decimal number.&lt;/p&gt;
&lt;h2&gt;Quiz: Tibbles versus Data Frames&lt;/h2&gt;
Which answers about data frames and tibbles are correct?
&lt;ul&gt;&lt;li&gt;The printed output to the console is the same for tibbles and data frames&lt;/li&gt;&lt;li&gt;All functions defined for data frames also work on tibbles.&lt;/li&gt;&lt;li&gt;Tibbles also show the data types in the console output.&lt;/li&gt;&lt;li&gt;To use tibble objects the &lt;strong&gt;tibbles&lt;/strong&gt; package needs to be loaded.&lt;/li&gt;&lt;li&gt;The table dimensions are not shown in the console output for tibbles.&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/02-creating-and-converting-tibbles/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Creating Tibbles&lt;/h2&gt;
&lt;pre&gt;tibble(___ = ___, 
       ___ = ___, 
       ...)
as_tibble(___)&lt;/pre&gt;
&lt;p&gt;The creation of tibbles works exactly the same as for data frames. We can use the &lt;code&gt;tibble()&lt;/code&gt; function from the &lt;strong&gt;tibble&lt;/strong&gt; package to create a new tabular object.&lt;/p&gt;
&lt;p&gt;For example, a tibble containing data from four different people and three columns can be created like this:&lt;/p&gt;
&lt;pre&gt;library(tibble)
tibble(
  id = c(1, 2, 3, 4),
  name = c("Louisa", "Jonathan", "Luigi", "Rachel"),
  female = c(TRUE, FALSE, FALSE, TRUE)
)&lt;/pre&gt;
&lt;pre&gt;# A tibble: 4 x 3
     id name     female
  &lt;dbl&gt; &lt;chr&gt;    &lt;lgl&gt; 
1     1 Louisa   TRUE  
2     2 Jonathan FALSE 
3     3 Luigi    FALSE 
4     4 Rachel   TRUE  &lt;/pre&gt;
&lt;h2&gt;Converting data frames to Tibbles&lt;/h2&gt;
&lt;p&gt;If you prefer tibbles to data frames for their additional features they can also be converted from existing data frames with the &lt;code&gt;as_tibble()&lt;/code&gt; function.&lt;/p&gt;
&lt;p&gt;For example, the &lt;code&gt;Davis&lt;/code&gt; data frame from the &lt;strong&gt;carData&lt;/strong&gt; package can be converted to a tibble like so:&lt;/p&gt;
&lt;pre&gt;as_tibble(Davis)&lt;/pre&gt;
&lt;pre&gt;# A tibble: 200 x 5
   sex   weight height repwt repht
   &lt;fct&gt;  &lt;int&gt;  &lt;int&gt; &lt;int&gt; &lt;int&gt;
 1 M         77    182    77   180
 2 F         58    161    51   159
 3 F         53    161    54   158
 4 M         68    177    70   175
 5 F         59    157    59   155
 6 M         76    170    76   165
 7 M         76    167    77   165
 8 M         69    186    73   180
 9 M         71    178    71   175
10 M         65    171    64   170
# … with 190 more rows&lt;/pre&gt;
&lt;h2&gt;Exercise: Convert data frame to Tibble&lt;/h2&gt;
&lt;pre&gt;  speed dist
1     4    2
2     4   10
3     7    4
 [ reached 'max' / getOption("max.print") -- omitted 47 rows ]&lt;/pre&gt;
&lt;p&gt;The data frame &lt;code&gt;cars&lt;/code&gt; reports the speed of cars and distances taken to stop. To have a nicer printed output in the console use the &lt;code&gt;as_tibble()&lt;/code&gt; function and create a tibble object out of it.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/02-creating-and-converting-tibbles/exercise-02-01-05"&gt;Start Exercise&lt;/a&gt;
&lt;p&gt;Create and convert tibbles is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Build a data frame from vectors</title><link>https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/01-build-data-frame-from-vectors</link><pubDate>Mon, 18 May 2020 17:18:16 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/01-build-data-frame-from-vectors</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/02-data-frames-tibbles/01-build-data-frame-from-vectors.png"&gt;
&lt;p&gt;Tabular data is the most common format used by data scientists. In R, tables are represented through data frames. They can be inspected by printing them to the console.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Understand why data frames are important&lt;/li&gt;
&lt;li&gt;Interpret console output created by a data frame&lt;/li&gt;
&lt;li&gt;Create a new data frame using the &lt;code&gt;data.frame()&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;Define vectors to be used for single columns&lt;/li&gt;
&lt;li&gt;Specify names of data frame columns&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;data.frame(___ = ___, 
           ___ = ___, 
           ...)&lt;/pre&gt;
&lt;h2&gt;Introduction to Data Frames&lt;/h2&gt;
&lt;p&gt;In analysis and statistics, tabular data is the most important data structure. It is present in many common formats like Excel files, comma separated values (CSV) or databases. R integrates tabular data objects as first-class citizens into the language through &lt;em&gt;data frames&lt;/em&gt;. Data frames allow users to easily read and manipulate tabular data within the R language.&lt;/p&gt;
&lt;p&gt;Let’s take a look at a data frame object named &lt;code&gt;Davis&lt;/code&gt;, from the package &lt;strong&gt;carData&lt;/strong&gt;, which includs height and weight measurements for 200 men and women:&lt;/p&gt;
&lt;pre&gt;Davis&lt;/pre&gt;
&lt;pre&gt;  sex weight height repwt repht
1   M     77    182    77   180
2   F     58    161    51   159
3   F     53    161    54   158
 [ reached 'max' / getOption("max.print") -- omitted 197 rows ]&lt;/pre&gt;
&lt;p&gt;From the printed output we can see that the data frame spans over 200 &lt;strong&gt;rows&lt;/strong&gt; (3 printed, 197 omitted) and 5 &lt;strong&gt;columns&lt;/strong&gt;. In the example above, each row contains data of one person through &lt;strong&gt;attributes&lt;/strong&gt;, which correspond to the columns &lt;code&gt;sex&lt;/code&gt;, &lt;code&gt;weight&lt;/code&gt;, &lt;code&gt;height&lt;/code&gt;, reported weight &lt;code&gt;repwt&lt;/code&gt; and reported height &lt;code&gt;repht&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For example, the first row in the table specifies a &lt;code&gt;M&lt;/code&gt;ale weighing &lt;code&gt;77&lt;/code&gt;kg and has a height of &lt;code&gt;182&lt;/code&gt;cm. The reported weights are very close with &lt;code&gt;77&lt;/code&gt;kg and &lt;code&gt;180&lt;/code&gt;cm, respectively.&lt;/p&gt;
&lt;p&gt;The rows in a data frame are further identified by &lt;em&gt;row names&lt;/em&gt; on the left which are simply the row numbers by default. In the case of the &lt;code&gt;Davis&lt;/code&gt; dataset above the row names range from 1 to 200.&lt;/p&gt;
&lt;h2&gt;Quiz: Data Frame Output&lt;/h2&gt;
&lt;pre&gt;      rank discipline yrs.since.phd yrs.service  sex salary
1     Prof          B            19          18 Male 139750
2     Prof          B            20          16 Male 173200
3 AsstProf          B             4           3 Male  79750
 [ reached 'max' / getOption("max.print") -- omitted 394 rows ]&lt;/pre&gt;
&lt;p&gt;The data frame above shows the nine-month academic salary for Assistant Professors, Associate Professors and Professors in a college in the U.S.&lt;/p&gt;
Which answers about the data frame printed above are correct?
&lt;ul&gt;&lt;li&gt;The data frame has 3 rows.&lt;/li&gt;&lt;li&gt;The data frame has 394 rows.&lt;/li&gt;&lt;li&gt;The data frame has 397 rows.&lt;/li&gt;&lt;li&gt;The data frame has 6 attributes.&lt;/li&gt;&lt;li&gt;The attribute names contain &lt;code&gt;Prof&lt;/code&gt; and &lt;code&gt;AsstProf&lt;/code&gt;&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/01-build-data-frame-from-vectors/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Quiz: Data Frame Output (2)&lt;/h2&gt;
&lt;pre&gt;      rank discipline yrs.since.phd yrs.service  sex salary
1     Prof          B            19          18 Male 139750
2     Prof          B            20          16 Male 173200
3 AsstProf          B             4           3 Male  79750
 [ reached 'max' / getOption("max.print") -- omitted 394 rows ]&lt;/pre&gt;
&lt;p&gt;The data frame above shows the nine-month academic salary for Assistant Professors, Associate Professors and Professors in a college in the U.S.&lt;/p&gt;
Which answers about the first three faculty members are correct?
&lt;ul&gt;&lt;li&gt;All three are male.&lt;/li&gt;&lt;li&gt;The salaries of all three members are about the same.&lt;/li&gt;&lt;li&gt;The Professor in row three is most probably be the oldest.&lt;/li&gt;&lt;li&gt;All shown professors are from the same discipline.&lt;/li&gt;&lt;li&gt;The highest salary amongst the three Professors is $139,750.&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/01-build-data-frame-from-vectors/quiz-2"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Creating Data Frames&lt;/h2&gt;
&lt;pre&gt;data.frame(___ = ___, 
           ___ = ___, 
           ...)&lt;/pre&gt;
&lt;p&gt;Data frames hold tabular data in various columns or &lt;em&gt;attributes&lt;/em&gt;. Each column is represented by a vector of different &lt;em&gt;data types&lt;/em&gt; like numbers or characters. The &lt;code&gt;data.frame()&lt;/code&gt; function supports the construction of data frame objects by combining different vectors to a table. To form a table, vectors are required to have equal lengths. A data frame can also be seen as a collection of vectors connected together to form a table.&lt;/p&gt;
&lt;p&gt;Let’s create our first data frame with four different people including their ids, names and indicators if they are female or not. Each of these attributes is created by a different vector of different data types (numeric, character and logical). The attributes are finally combined to a table using the &lt;code&gt;data.frame()&lt;/code&gt; function:&lt;/p&gt;
&lt;pre&gt;data.frame(
  c(1, 2, 3, 4),
  c("Louisa", "Jonathan", "Luigi", "Rachel"),
  c(TRUE, FALSE, FALSE, TRUE)
)&lt;/pre&gt;
&lt;pre&gt;  c.1..2..3..4. c..Louisa....Jonathan....Luigi....Rachel..
1             1                                     Louisa
2             2                                   Jonathan
3             3                                      Luigi
4             4                                     Rachel
  c.TRUE..FALSE..FALSE..TRUE.
1                        TRUE
2                       FALSE
3                       FALSE
4                        TRUE&lt;/pre&gt;
&lt;p&gt;The resulting data frame stores the values of each vector in a different column. It has four rows and three columns. However, the column names printed on the first line seem to include the column values separated by dots which is a very strange naming scheme!&lt;/p&gt;
&lt;p&gt;Column names can be included into the &lt;code&gt;data.frame()&lt;/code&gt; construction as argument names preceding the values of column vectors. To improve the column naming of the previous data frame we can write&lt;/p&gt;
&lt;pre&gt;data.frame(
  id = c(1, 2, 3, 4),
  name = c("Louisa", "Jonathan", "Luigi", "Rachel"),
  female = c(TRUE, FALSE, FALSE, TRUE)
)&lt;/pre&gt;
&lt;pre&gt;  id     name female
1  1   Louisa   TRUE
2  2 Jonathan  FALSE
3  3    Luigi  FALSE
4  4   Rachel   TRUE&lt;/pre&gt;
&lt;p&gt;The resulting data frame includes the column names needed to see the actual meaning of the different columns.&lt;/p&gt;
&lt;h2&gt;Exercise: Creating Your First Data Frame&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;&lt;tr class="header"&gt;
&lt;th align="left"&gt;weekday&lt;/th&gt;
&lt;th align="right"&gt;temperature&lt;/th&gt;
&lt;th align="left"&gt;hot&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="odd"&gt;
&lt;td align="left"&gt;Monday&lt;/td&gt;
&lt;td align="right"&gt;28&lt;/td&gt;
&lt;td align="left"&gt;FALSE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="even"&gt;
&lt;td align="left"&gt;Tuesday&lt;/td&gt;
&lt;td align="right"&gt;31&lt;/td&gt;
&lt;td align="left"&gt;TRUE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="odd"&gt;
&lt;td align="left"&gt;Wednesday&lt;/td&gt;
&lt;td align="right"&gt;25&lt;/td&gt;
&lt;td align="left"&gt;FALSE&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Let’s create a data frame as shown above using the &lt;code&gt;data.frame()&lt;/code&gt; function. The resulting data frame should consist of the three columns &lt;code&gt;weekday&lt;/code&gt;, &lt;code&gt;temperature&lt;/code&gt; and &lt;code&gt;hot&lt;/code&gt;:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;The first column named &lt;code&gt;weekday&lt;/code&gt; contains the weekday names &lt;code&gt;"Monday"&lt;/code&gt;, &lt;code&gt;"Tuesday"&lt;/code&gt;, &lt;code&gt;"Wednesday"&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The second column named &lt;code&gt;temperature&lt;/code&gt; contains the temperatures (in degrees Celsius) as &lt;code&gt;28&lt;/code&gt;, &lt;code&gt;31&lt;/code&gt;, &lt;code&gt;25&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The third column named &lt;code&gt;hot&lt;/code&gt; contains the logical values &lt;code&gt;FALSE&lt;/code&gt;, &lt;code&gt;TRUE&lt;/code&gt;, &lt;code&gt;FALSE&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Store the final data frame in the variable &lt;code&gt;temp&lt;/code&gt; and print its output to the console:&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/01-build-data-frame-from-vectors/exercise-02-01-03"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Quiz: Which statements are true about this data frame?&lt;/h2&gt;
&lt;pre&gt;price &lt;- c(28, 31, 25)
data.frame(
  weekday = c("Monday", "Tuesday", "Wednesday", "Thursday"),
  price = price,
  expensive = price &gt; 30
)&lt;/pre&gt;
Which statements are true about the data frame above?
&lt;ul&gt;&lt;li&gt;The &lt;code&gt;data.frame()&lt;/code&gt; function will fail because the column &lt;code&gt;expensive&lt;/code&gt; is no vector.&lt;/li&gt;&lt;li&gt;The &lt;code&gt;data.frame()&lt;/code&gt; function will not fail&lt;/li&gt;&lt;li&gt;The &lt;code&gt;data.frame()&lt;/code&gt; function fails because the lengths of the vectors are different&lt;/li&gt;&lt;li&gt;The command would work if &lt;code&gt;weekday&lt;/code&gt; had the values &lt;code&gt;c(&amp;quot;Monday&amp;quot;, &amp;quot;Tuesday&amp;quot;, &amp;quot;Wednesday&amp;quot;)&lt;/code&gt;&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/02-data-frames-tibbles/01-build-data-frame-from-vectors/quiz-3"&gt;Start Quiz&lt;/a&gt;
&lt;p&gt;Build a data frame from vectors is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Use existing functions and data through packages</title><link>https://www.quantargo.com/courses/course-r-introduction/01-basics/06-packages</link><pubDate>Thu, 14 May 2020 09:51:58 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/01-basics/06-packages</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/01-basics/06-packages.png"&gt;
&lt;p&gt;Packages give you access to a huge set of functions and datasets, most of which are provided by the generous R community. They are the secret sauce which makes it possible to use R for pretty much anything you can imagine. Additionally, lots of packages are open source which can be a great learning resource.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Get to know the concept of packages in R&lt;/li&gt;
&lt;li&gt;Learn how to call functions from packages&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;library(___)
data(___)&lt;/pre&gt;
&lt;h2&gt;Introduction to packages&lt;/h2&gt;
&lt;p&gt;Packages are one of the best things in R. They add new functions and features to the language environment and extend its applications over many different use cases and domains. Packages are supported by a large community of developers and allow R to connect to many different external algorithms and libraries—many of them even written in different programming languages.&lt;/p&gt;
&lt;p&gt;Contributors all over the world including developers or domain experts in physics, finance, statistics etc. create a lot of additional content, such as custom functions for specific use cases. These functions, together with documentation, help files and datasets can be gathered into packages. Packages can be made public through &lt;em&gt;package repositories&lt;/em&gt; so that anyone can install and use them. The most popular package repository is &lt;a href="https://cran.r-project.org/"&gt;CRAN&lt;/a&gt; which hosts over 15,000 packages.&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/01-basics/06-packages_files/figure-html/unnamed-chunk-2-1.png"&gt;
&lt;h2&gt;Calling a package&lt;/h2&gt;
&lt;p&gt;As a demonstration we will use the &lt;code&gt;generate_primes()&lt;/code&gt; function from the &lt;code&gt;primes&lt;/code&gt; package. This function takes two numbers as parameters and outputs all prime numbers inside their range.&lt;/p&gt;
&lt;p&gt;In order to use a package we first need to load it. This can be done by applying the &lt;code&gt;library()&lt;/code&gt; function and inserting the name of the package as the first argument of the function. After that, we have access to all of the content in the package and can use functions from it as usual.&lt;/p&gt;
&lt;pre&gt;library(primes)
generate_primes(min = 500, max = 550)&lt;/pre&gt;
&lt;pre&gt;[1] 503 509 521 523 541 547&lt;/pre&gt;
&lt;h2&gt;Exercise: Check for leap year&lt;/h2&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Load the &lt;code&gt;lubridate&lt;/code&gt; package.&lt;/li&gt;
&lt;li&gt;Use the &lt;code&gt;leap_year&lt;/code&gt; function to check if 2020 is leap year or not. (Hint: the function takes the year in the form of a number as the first parameter &lt;code&gt;date&lt;/code&gt; )&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/06-packages/exercise-01-06-00"&gt;Start Exercise&lt;/a&gt;
&lt;p&gt;Use existing functions and data through packages is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Call existing R code through functions</title><link>https://www.quantargo.com/courses/course-r-introduction/01-basics/05-functions</link><pubDate>Mon, 11 May 2020 20:11:16 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/01-basics/05-functions</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/01-basics/05-functions.png"&gt;
&lt;p&gt;When you write code, functions are your best friends. They can make hard things very easy or provide new functionality in a nice way. Through functions you gain access to all the powerful features R has to offer.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Call functions with function names and round brackets&lt;/li&gt;
&lt;li&gt;Use basic mathematical functions on vectors&lt;/li&gt;
&lt;li&gt;Customize functions through parameters &lt;/li&gt;
&lt;li&gt;Create number sequences using &lt;code&gt;seq()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Create random numbers using &lt;code&gt;runif()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Sample vectors using &lt;code&gt;sample()&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;abs(___)
sqrt(___)
seq(___)
runif(___)&lt;/pre&gt;
&lt;h2&gt;Introduction to functions&lt;/h2&gt;
&lt;p&gt;Functions in any programming language can be described as predefined, reusable code intended to accomplish a specific task. Functions in R can be used by using their name and round brackets right after the that. Inside the brackets, we can specify parameters for the function. One function we have already used extensively is the concatenate function &lt;code&gt;c()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;A simple function for example is &lt;code&gt;abs()&lt;/code&gt; which is used to get the absolute value of a number. In the following example, the function is given &lt;code&gt;-3&lt;/code&gt; as input and returns the result &lt;code&gt;3&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;abs(-3)&lt;/pre&gt;
&lt;pre&gt;[1] 3&lt;/pre&gt;
&lt;h2&gt;Exercise: Use the sqrt() function&lt;/h2&gt;
&lt;p&gt;Use the &lt;code&gt;sqrt()&lt;/code&gt; function to get the square-root of 8.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/05-functions/exercise-01-05-00"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Customizing functions through parameters&lt;/h2&gt;
&lt;p&gt;Functions take parameters, that customize them for the given task. For example, the &lt;code&gt;runif()&lt;/code&gt; function generates uniformly distributed values, which means that all outcomes have the same probability. By default, it takes the following parameters:&lt;/p&gt;
&lt;pre&gt;runif(n, min = 0, max = 1)&lt;/pre&gt;
&lt;p&gt;The first parameter &lt;code&gt;n&lt;/code&gt; is the number of values we want to generate. This is a mandatory parameter, that we need to define, in order for the function to work.&lt;/p&gt;
&lt;p&gt;On the other hand, we can see that some of the parameters have default values defined by the equals sign &lt;code&gt;=&lt;/code&gt;. This means that if we don’t explicitly specify these parameter in the brackets, the function will take the default ones. Let’s take a look at an example:&lt;/p&gt;
&lt;pre&gt;runif(n = 5)&lt;/pre&gt;
&lt;pre&gt;[1] 0.08988000 0.07848433 0.59898103 0.57674865 0.62216434&lt;/pre&gt;
&lt;p&gt;The output is a numeric vector of 5 numbers. Each of them is between 0 and 1, since we did not change the default setting. If we changed the parameters &lt;code&gt;min&lt;/code&gt; and &lt;code&gt;max&lt;/code&gt; as well, we could further customize the output:&lt;/p&gt;
&lt;pre&gt;runif(n = 5, min = 8, max = 9)&lt;/pre&gt;
&lt;pre&gt;[1] 8.963653 8.789039 8.520760 8.614895 8.852204&lt;/pre&gt;
&lt;p&gt;It is also possible to leave out the name of the parameters and simply type in the input values like this:&lt;/p&gt;
&lt;pre&gt;runif(5, 8, 9)&lt;/pre&gt;
&lt;pre&gt;[1] 8.714105 8.409777 8.189146 8.849575 8.224963&lt;/pre&gt;
&lt;p&gt;However, in this case we must be cautious about the order of inputs, since each function has a default order for the parameters. If we don’t explicitly name the parameters we are setting, R will assume, that we set them in the predefined order.&lt;/p&gt;
&lt;h2&gt;Exercise: Use the sample() function&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;sample()&lt;/code&gt; function takes a vector and returns a random sample from it. The first two of its parameters are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;x&lt;/code&gt;, which defines the vector&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;size&lt;/code&gt;, which defines the number of elements we want to include in the random sample&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use the &lt;code&gt;sample()&lt;/code&gt; function and sample 5 random values from the &lt;code&gt;full&lt;/code&gt; variable.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/05-functions/exercise-01-05-01"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Exercise: Use the seq() function&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;seq()&lt;/code&gt; function creates a sequence of whole numbers. The first three of its parameters are: &lt;code&gt;from&lt;/code&gt;, &lt;code&gt;to&lt;/code&gt; and &lt;code&gt;by&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;from&lt;/code&gt; defines the start of the sequence&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;to&lt;/code&gt; defines the end of the sequence&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;by&lt;/code&gt; sets the steps between the single values&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use the &lt;code&gt;seq()&lt;/code&gt; function and create a sequence of numbers from 2 to 10 but only include every second value. Thus, the output should be: &lt;code&gt;2&lt;/code&gt;, &lt;code&gt;4&lt;/code&gt;, &lt;code&gt;6&lt;/code&gt;, &lt;code&gt;8&lt;/code&gt;, &lt;code&gt;10&lt;/code&gt;.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/05-functions/exercise-01-05-02"&gt;Start Exercise&lt;/a&gt;
&lt;p&gt;Call existing R code through functions is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Use basic operators</title><link>https://www.quantargo.com/courses/course-r-introduction/01-basics/04-operators</link><pubDate>Fri, 08 May 2020 11:31:23 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/01-basics/04-operators</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/01-basics/04-operators.png"&gt;
&lt;p&gt;R is not only good for analysing and visualizing data, but also for solving maths problems or comparing data with each other. Plus you can use it just like a pocket calculator.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use R as a pocket calculator&lt;/li&gt;
&lt;li&gt;Use arithmetic operators on vectors&lt;/li&gt;
&lt;li&gt;Use relational operators on vectors&lt;/li&gt;
&lt;li&gt;Use logical operators on vectors&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;___ + ___
___ - ___
___ / ___
___ * ___
___ ^ ___

___ == ___
___ != ___
___ &lt; ___
___ &gt; ___
___ &lt;= ___
___ &gt;= ___

___ &amp; ___
___ | ___

___ %in% ___&lt;/pre&gt;
&lt;h2&gt;Using R as a pocket calculator&lt;/h2&gt;
&lt;pre&gt;___ + ___
___ - ___
___ / ___
___ * ___&lt;/pre&gt;
&lt;p&gt;R is a programming language mainly developed for statistics and data analysis. Within R you can use mathematical operators just like you would use on a calculator. For example, you can add &lt;code&gt;+&lt;/code&gt; and subtract &lt;code&gt;-&lt;/code&gt; numbers from each other:&lt;/p&gt;
&lt;pre&gt;5 + 5&lt;/pre&gt;
&lt;pre&gt;[1] 10&lt;/pre&gt;
&lt;pre&gt;7 - 3.5&lt;/pre&gt;
&lt;pre&gt;[1] 3.5&lt;/pre&gt;
&lt;p&gt;Similarly, you can multiply &lt;code&gt;*&lt;/code&gt; or divide &lt;code&gt;/&lt;/code&gt; numbers:&lt;/p&gt;
&lt;pre&gt;5 * 7&lt;/pre&gt;
&lt;pre&gt;[1] 35&lt;/pre&gt;
&lt;pre&gt;8 / 4&lt;/pre&gt;
&lt;pre&gt;[1] 2&lt;/pre&gt;
&lt;p&gt;You can take the power of a number by using the &lt;code&gt;^&lt;/code&gt; sign:&lt;/p&gt;
&lt;pre&gt;2 ^ 3&lt;/pre&gt;
&lt;pre&gt;[1] 8&lt;/pre&gt;
&lt;p&gt;According to the rules of mathematics, you can use round brackets to specify the order of evaluation in more complex tasks:&lt;/p&gt;
&lt;pre&gt;5 * (2 + 4 / 2)&lt;/pre&gt;
&lt;pre&gt;[1] 20&lt;/pre&gt;
&lt;h2&gt;Exercise: Use basic arithmetic&lt;/h2&gt;
&lt;p&gt;To calculate the mean of the numbers &lt;code&gt;2&lt;/code&gt;, &lt;code&gt;3&lt;/code&gt;, &lt;code&gt;7&lt;/code&gt; and &lt;code&gt;8&lt;/code&gt;:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Add all the numbers together using &lt;code&gt;+&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Divide the result by the number of elements.&lt;/li&gt;
&lt;li&gt;Make sure that the result of the addition is calculated first by using braces &lt;code&gt;()&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/04-operators/exercise-01-04-00"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Applying arithmetic operators on vectors&lt;/h2&gt;
&lt;pre&gt;___ + ___
___ - ___
___ / ___
___ * ___&lt;/pre&gt;
&lt;p&gt;Operations, such as addition, subtraction, multiplication and division are called &lt;em&gt;arithmetic operations&lt;/em&gt;. They can not only operate with single values but also with vectors. If you use arithmetic operations on vectors, the operation is done on each individual number from the first vector and the individual number at the same position from the second vector.&lt;/p&gt;
&lt;p&gt;In the following example we create two numeric vectors and assign them to the variables &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt;. We then add them together:&lt;/p&gt;
&lt;pre&gt;a &lt;- c(1, 3, 6, 9, 12, 15)
b &lt;- c(2, 4, 6, 8, 10, 12)
a + b&lt;/pre&gt;
&lt;pre&gt;[1]  3  7 12 17 22 27&lt;/pre&gt;
&lt;p&gt;As the output shows, the first elements of the two vectors were added together and resulted in &lt;code&gt;1 + 2 = 3&lt;/code&gt;. The second elements added up to &lt;code&gt;3 + 4 = 7&lt;/code&gt;, the third elements to &lt;code&gt;6 + 6 = 12&lt;/code&gt; and so on.&lt;/p&gt;
&lt;p&gt;We can apply any other arithmetic operation in a similar way:&lt;/p&gt;
&lt;pre&gt;a &lt;- c(22, 10, 7, 3, 14, 4)
b &lt;- c(4, 5, 2, 6, 14, 8)
a / b&lt;/pre&gt;
&lt;pre&gt;[1] 5.5 2.0 3.5 0.5 1.0 0.5&lt;/pre&gt;
&lt;p&gt;Using the same principle, the first element of the result is &lt;code&gt;22 / 4 = 5.5&lt;/code&gt;, the second is &lt;code&gt;10 / 5 = 2&lt;/code&gt; and so on.&lt;/p&gt;
&lt;h2&gt;Quiz: Vector Multiplication&lt;/h2&gt;
&lt;pre&gt;odd &lt;- c(1, 3, 5)
even &lt;- c(2, 4, 6)
odd * even&lt;/pre&gt;
Inspect the code chunk above. What is the result of the multiplication?
&lt;ul&gt;&lt;li&gt;&lt;code&gt;108&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;54&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;15&lt;/code&gt;, &lt;code&gt;48&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;2&lt;/code&gt;, &lt;code&gt;12&lt;/code&gt;, &lt;code&gt;30&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;18&lt;/code&gt;, &lt;code&gt;36&lt;/code&gt;, &lt;code&gt;54&lt;/code&gt;&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/04-operators/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Exercise: Multiply numeric vectors&lt;/h2&gt;
&lt;p&gt;Multiply the numeric vectors &lt;code&gt;ascending&lt;/code&gt; and &lt;code&gt;descending&lt;/code&gt;:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Create a vector with the numbers &lt;code&gt;1&lt;/code&gt;, &lt;code&gt;2&lt;/code&gt;, &lt;code&gt;3&lt;/code&gt; and &lt;code&gt;4&lt;/code&gt; and assign it to the variable &lt;code&gt;ascending&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Create a vector with the numbers &lt;code&gt;4&lt;/code&gt;, &lt;code&gt;3&lt;/code&gt;, &lt;code&gt;2&lt;/code&gt; and &lt;code&gt;1&lt;/code&gt; and assign it to the variable &lt;code&gt;descending&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Multiply (&lt;code&gt;*&lt;/code&gt;) the variable &lt;code&gt;ascending&lt;/code&gt; with the variable &lt;code&gt;descending&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/04-operators/exercise-01-04-02"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Using relational operators&lt;/h2&gt;
&lt;pre&gt;___ == ___
___ != ___
___ &lt; ___
___ &gt; ___
___ &lt;= ___
___ &gt;= ___&lt;/pre&gt;
&lt;p&gt;Relational operators are used to compare two values. The output of these operations is always a logical value &lt;code&gt;TRUE&lt;/code&gt; or &lt;code&gt;FALSE&lt;/code&gt;. We distinguish six different types relational operators, as we’ll see below.&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;equal&lt;/em&gt; &lt;code&gt;==&lt;/code&gt; and &lt;em&gt;not equal&lt;/em&gt; &lt;code&gt;!=&lt;/code&gt; operators check whether two values are the same (or not):&lt;/p&gt;
&lt;pre&gt;2 == 1 + 1&lt;/pre&gt;
&lt;pre&gt;[1] TRUE&lt;/pre&gt;
&lt;pre&gt;2 != 3&lt;/pre&gt;
&lt;pre&gt;[1] TRUE&lt;/pre&gt;
&lt;p&gt;The &lt;em&gt;less than&lt;/em&gt; &lt;code&gt;&amp;lt;&lt;/code&gt; and &lt;em&gt;greater than&lt;/em&gt; &lt;code&gt;&amp;gt;&lt;/code&gt; operators check, whether a value is less or greater than another one:&lt;/p&gt;
&lt;pre&gt;2 &gt; 4&lt;/pre&gt;
&lt;pre&gt;[1] FALSE&lt;/pre&gt;
&lt;pre&gt;2 &lt; 4&lt;/pre&gt;
&lt;pre&gt;[1] TRUE&lt;/pre&gt;
&lt;p&gt;The &lt;em&gt;less than or equal to&lt;/em&gt; &lt;code&gt;&amp;lt;=&lt;/code&gt; and the &lt;em&gt;greater than or equal to&lt;/em&gt; &lt;code&gt;&amp;gt;=&lt;/code&gt; operators combine the check for equality with either the less or the greater than comparison:&lt;/p&gt;
&lt;pre&gt;2 &gt;= 2&lt;/pre&gt;
&lt;pre&gt;[1] TRUE&lt;/pre&gt;
&lt;pre&gt;2 &lt;= 3&lt;/pre&gt;
&lt;pre&gt;[1] TRUE&lt;/pre&gt;
&lt;p&gt;All of these operators can be used on vectors with one or more elements as well. In that case, each element of one vector is compared with the element at the same position in the other vector, just as with the mathematical operators:&lt;/p&gt;
&lt;pre&gt;vector1 &lt;- c(3, 5, 2, 7, 4, 2)
vector2 &lt;- c(2, 6, 3, 3, 4, 1)
vector1 &gt; vector2&lt;/pre&gt;
&lt;pre&gt;[1]  TRUE FALSE FALSE  TRUE FALSE  TRUE&lt;/pre&gt;
&lt;p&gt;Therefore, the output of this example is based on the comparisons &lt;code&gt;3 &amp;gt; 2&lt;/code&gt;, &lt;code&gt;5 &amp;gt; 6&lt;/code&gt;, &lt;code&gt;2 &amp;gt; 3&lt;/code&gt; and so on.&lt;/p&gt;
&lt;h2&gt;Exercise: Compare numeric values&lt;/h2&gt;
&lt;p&gt;Use the appropriate relational operator and check whether 3 is greater than or equal to 2&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/04-operators/exercise-01-04-03"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Exercise: Compare temperatures&lt;/h2&gt;
&lt;p&gt;In the following exercise, we make use of the &lt;a href="https://www.data.gv.at/katalog/dataset/5eb8278a-4ecf-41e2-a1f8-03383f31af7d" target="_blank"&gt;weather data&lt;/a&gt; gathered by the city of Innsbruck over the last decades. You are given two variables, &lt;code&gt;avgtemp_1997_2006&lt;/code&gt; and &lt;code&gt;avgtemp_2007_2016&lt;/code&gt;, each containing the monthly average temperatures in Innsbruck for the years 1997 to 2006 and 2007 to 2016.    &lt;/p&gt;
&lt;p&gt;Use an appropriate relational operator and check in which months there was an increase in the average temperature.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/04-operators/exercise-01-04-04"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Using logical operators&lt;/h2&gt;
&lt;pre&gt;___ &amp; ___
___ | ___&lt;/pre&gt;
&lt;p&gt;The &lt;em&gt;AND&lt;/em&gt; operator &lt;code&gt;&amp;amp;&lt;/code&gt; is a used for checking whether multiple statements are &lt;code&gt;TRUE&lt;/code&gt; at the same time. Using a simple example, we could check whether 3 is greater than 1 and at the same time if 4 is smaller than 2:&lt;/p&gt;
&lt;pre&gt;3 &gt; 1 &amp; 4 &lt; 2&lt;/pre&gt;
&lt;pre&gt;[1] FALSE&lt;/pre&gt;
&lt;p&gt;3 is in fact greater than 1, but 4 is not smaller than 2. Since one of the statements is &lt;code&gt;FALSE&lt;/code&gt;, the output of this joined evaluation is also &lt;code&gt;FALSE&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;OR&lt;/em&gt; operator &lt;code&gt;|&lt;/code&gt; checks only, whether any of the statements is &lt;code&gt;TRUE&lt;/code&gt;. &lt;/p&gt;
&lt;pre&gt;3 &gt; 1 | 4 &lt; 2&lt;/pre&gt;
&lt;pre&gt;[1] TRUE&lt;/pre&gt;
&lt;p&gt;In an &lt;em&gt;OR&lt;/em&gt; statement, not all elements have to be &lt;code&gt;TRUE&lt;/code&gt;. Since 3 is greater than 1, the output of this evaluation is &lt;code&gt;TRUE&lt;/code&gt; as well.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;!&lt;/code&gt; operator is used for the negation of logical values, which means it turns &lt;code&gt;TRUE&lt;/code&gt; values to &lt;code&gt;FALSE&lt;/code&gt; and &lt;code&gt;FALSE&lt;/code&gt; values to &lt;code&gt;TRUE&lt;/code&gt;. If we have a statement resulting in a logical &lt;code&gt;TRUE&lt;/code&gt; or &lt;code&gt;FALSE&lt;/code&gt; value, we can negate the result by applying the &lt;code&gt;!&lt;/code&gt; operator on it. In the following example we check whether 3 is greater than 2 and then negate the result of this comparison:&lt;/p&gt;
&lt;pre&gt;!3 &gt; 2&lt;/pre&gt;
&lt;pre&gt;[1] FALSE&lt;/pre&gt;
&lt;p&gt;Logical operators, just like arithmetic and relational operators, can be used on longer vectors as well. In the following example we use three different vectors &lt;code&gt;a&lt;/code&gt;, &lt;code&gt;b&lt;/code&gt; and &lt;code&gt;c&lt;/code&gt; and try to evaluate multiple relations in combination.&lt;/p&gt;
&lt;pre&gt;a &lt;- c(1, 21, 3, 4)
b &lt;- c(4, 2, 5, 3)
c &lt;- c(3, 23, 5, 3)

a&gt;b &amp; b&lt;c&lt;/pre&gt;
&lt;pre&gt;[1] FALSE  TRUE FALSE FALSE&lt;/pre&gt;
&lt;p&gt;First, both relational comparisons &lt;code&gt;a&amp;gt;b&lt;/code&gt; and &lt;code&gt;b&amp;lt;c&lt;/code&gt; are evaluated and result in two logical vectors. Therefore, we essentially compare the following two vectors:&lt;/p&gt;
&lt;pre&gt;c(FALSE, TRUE, FALSE, TRUE) &amp; c(FALSE, TRUE, FALSE, FALSE)&lt;/pre&gt;
&lt;pre&gt;[1] FALSE  TRUE FALSE FALSE&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;&amp;amp;&lt;/code&gt; operator checks whether both values at the same position in the vectors are &lt;code&gt;TRUE&lt;/code&gt;. If any value of the pairs is &lt;code&gt;FALSE&lt;/code&gt;, the combination is &lt;code&gt;FALSE&lt;/code&gt; as well.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;|&lt;/code&gt; operator checks whether any of the values at the same position in the vectors is &lt;code&gt;TRUE&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;c(FALSE, TRUE, FALSE, TRUE) | c(FALSE, TRUE, FALSE, FALSE)&lt;/pre&gt;
&lt;pre&gt;[1] FALSE  TRUE FALSE  TRUE&lt;/pre&gt;
&lt;h2&gt;Exercise: Use the &amp; operator&lt;/h2&gt;
&lt;p&gt;You are given three variables &lt;code&gt;alpha&lt;/code&gt;, &lt;code&gt;beta&lt;/code&gt; and &lt;code&gt;gamma&lt;/code&gt;. Use an appropriate logical operator and check whether &lt;code&gt;alpha&lt;/code&gt; is greater than &lt;code&gt;beta&lt;/code&gt; and at the same time &lt;code&gt;gamma&lt;/code&gt; is smaller than &lt;code&gt;beta&lt;/code&gt;.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/04-operators/exercise-01-04-05"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Exercise: Use the | operator&lt;/h2&gt;
&lt;p&gt;You are given three variables &lt;code&gt;alpha&lt;/code&gt;, &lt;code&gt;beta&lt;/code&gt; and &lt;code&gt;gamma&lt;/code&gt;. Each contains a numeric vector of two elements. Use the appropriate logical operator and check whether &lt;code&gt;alpha&lt;/code&gt; is greater than &lt;code&gt;beta&lt;/code&gt; OR &lt;code&gt;gamma&lt;/code&gt; is less than &lt;code&gt;beta&lt;/code&gt;. (Hint: use the logical OR operator &lt;code&gt;|&lt;/code&gt;) &lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/04-operators/exercise-01-04-06"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Using the %in% operator&lt;/h2&gt;
&lt;pre&gt;___ %in% ___&lt;/pre&gt;
&lt;p&gt;One additional, often used special operator is the &lt;code&gt;%in%&lt;/code&gt; operator. It checks whether or not the contents of one vector are present in another one as well.&lt;/p&gt;
&lt;p&gt;In the following example we use the variable &lt;code&gt;EU&lt;/code&gt; containing the abbreviation of all countries in the European Union. Then, we check whether or not the character &lt;code&gt;"AU"&lt;/code&gt; is present in the &lt;code&gt;EU&lt;/code&gt; variable.&lt;/p&gt;
&lt;pre&gt;EU &lt;- c("AU","BE","BG","CY","CZ","DE","DK","EE","ES","FI","FR","GR","HR","HU",
        "IE","IT","LT","LU","LV","MT","NL","PO","PT","RO","SE","SI","SK")
"AU" %in% EU&lt;/pre&gt;
&lt;pre&gt;[1] TRUE&lt;/pre&gt;
&lt;p&gt;The following example extends the search and compares multiple elements with the contents of the &lt;code&gt;EU&lt;/code&gt; variable. It outputs a logical vector as a result containing a logical value for each element:&lt;/p&gt;
&lt;pre&gt;c("AU","HU","UK") %in% EU&lt;/pre&gt;
&lt;pre&gt;[1]  TRUE  TRUE FALSE&lt;/pre&gt;
&lt;p&gt;As the output shows, the first two character elements &lt;code&gt;"AU"&lt;/code&gt; and &lt;code&gt;"HU"&lt;/code&gt; are present in the variable &lt;code&gt;EU&lt;/code&gt;, however the third element &lt;code&gt;"UK"&lt;/code&gt; is not.&lt;/p&gt;
&lt;h2&gt;Exercise: Use the %in% operator&lt;/h2&gt;
&lt;p&gt;You are standing in the supermarket and need to determine which you can check-off your &lt;code&gt;shopping_list&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use the &lt;code&gt;%in%&lt;/code&gt; operator and determine which &lt;code&gt;shopping_list&lt;/code&gt; items you can check-off your list based on the items in your &lt;code&gt;basket&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Print the output of the resulting vector to the console.&lt;/li&gt;
&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/04-operators/exercise-01-04-09"&gt;Start Exercise&lt;/a&gt;
&lt;p&gt;Use basic operators is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Create variables through assignments</title><link>https://www.quantargo.com/courses/course-r-introduction/01-basics/03-assignments</link><pubDate>Tue, 05 May 2020 08:25:31 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/01-basics/03-assignments</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/01-basics/03-assignments.png"&gt;
&lt;p&gt;Usually you want to store vectors and other objects into variables so you can work with them more easily. Variables are like a box with a name. You can then refer to the name to see what is stored inside.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Learn how to create a variable&lt;/li&gt;
&lt;li&gt;Use variables to store objects and vectors&lt;/li&gt;
&lt;li&gt;Reuse assigned objects through a variable name&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;___ &lt;- ___&lt;/pre&gt;
&lt;h2&gt;Assigning variables&lt;/h2&gt;
&lt;p&gt;Usually you want to use objects like vectors more than once. In order to save the trouble of retyping and recreating them all the time we would like to save them somewhere and reuse them later.&lt;/p&gt;
&lt;p&gt;To do this we can assign them to a variable &lt;em&gt;name&lt;/em&gt;. R uses the special arrow operator &lt;code&gt;&amp;lt;-&lt;/code&gt; for assigning values to a variable. The arrow is simply the combination of a smaller-than character (&lt;code&gt;&amp;lt;&lt;/code&gt;) and a minus sign (&lt;code&gt;-&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Let’s take a look at an example, in which we assign a numeric vector to a variable named &lt;code&gt;numbers&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;numbers &lt;- c(1, 2, 3, 4, 5, 6, 7, 8, 9)&lt;/pre&gt;
&lt;p&gt;Now we can use the variable’s name below to see its contents:&lt;/p&gt;
&lt;pre&gt;numbers&lt;/pre&gt;
&lt;pre&gt;[1] 1 2 3 4 5 6 7 8 9&lt;/pre&gt;
&lt;p&gt;Note, that when we assign something to a variable that already exists, it gets overwritten. All previous contents are automatically removed:&lt;/p&gt;
&lt;pre&gt;numbers &lt;- c(10, 11, 12, 13)
numbers&lt;/pre&gt;
&lt;pre&gt;[1] 10 11 12 13&lt;/pre&gt;
&lt;p&gt;Once you have defined a variable you can use it just like you would use the underlying vector itself. In the following example we create two numeric vectors and assign them to the variables &lt;code&gt;low&lt;/code&gt; and &lt;code&gt;high&lt;/code&gt;. Then we use these variables and concatenate the two vectors into a single one and assign it to the variable named &lt;code&gt;sequence&lt;/code&gt;. Finally we call the &lt;code&gt;sequence&lt;/code&gt; variable and inspect its contents:&lt;/p&gt;
&lt;pre&gt;low &lt;- c(1, 2, 3)
high &lt;- c(4, 5, 6)
sequence &lt;- c(low, high)
sequence&lt;/pre&gt;
&lt;pre&gt;[1] 1 2 3 4 5 6&lt;/pre&gt;
&lt;p&gt;As you can see, the vectors &lt;code&gt;1, 2, 3&lt;/code&gt; and &lt;code&gt;4, 5, 6&lt;/code&gt; stored in the variables &lt;code&gt;low&lt;/code&gt; and &lt;code&gt;high&lt;/code&gt;, were combined into a single vector that is now the the content of the variable &lt;code&gt;sequence&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;Exercise: Assign numeric vector to variable&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Use the concatenate function &lt;code&gt;c()&lt;/code&gt; and create a vector containing the numbers 2, 3, 5 and 7.&lt;/li&gt;
&lt;li&gt;Assign this vector to a variable named &lt;code&gt;primes&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/03-assignments/exercise-01-03-00"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Exercise: Assign character vector to variable&lt;/h2&gt;
&lt;p&gt;Use the concatenate function &lt;code&gt;c()&lt;/code&gt; and create a vector containing the words&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;"programming"&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"R"&lt;/code&gt; and&lt;/li&gt;
&lt;li&gt;&lt;code&gt;"variables"&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Assign this vector to the variable &lt;code&gt;fun&lt;/code&gt;.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/03-assignments/exercise-01-03-01"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Quiz: Variable Overriding&lt;/h2&gt;
&lt;pre&gt;fun &lt;- c("programming", "in", "R") 
fun &lt;- c("Have", "fun")
fun&lt;/pre&gt;
Inspect the code chunk above. What is the content of the variable &lt;code&gt;fun&lt;/code&gt; in the last step?
&lt;ul&gt;&lt;li&gt;&lt;code&gt;&amp;quot;programming&amp;quot;&lt;/code&gt; &lt;code&gt;&amp;quot;in&amp;quot;&lt;/code&gt; &lt;code&gt;&amp;quot;R&amp;quot;&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;&amp;quot;Have&amp;quot;&lt;/code&gt; &lt;code&gt;&amp;quot;fun&amp;quot;&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;&amp;quot;programming&amp;quot;&lt;/code&gt; &lt;code&gt;&amp;quot;in&amp;quot;&lt;/code&gt; &lt;code&gt;&amp;quot;R&amp;quot;&lt;/code&gt; &lt;code&gt;&amp;quot;Have&amp;quot;&lt;/code&gt; &lt;code&gt;&amp;quot;fun&amp;quot;&lt;/code&gt;&lt;/li&gt;&lt;li&gt;There is no output, only an error message.&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/03-assignments/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Quiz: Vector Concatenation&lt;/h2&gt;
&lt;pre&gt;fun &lt;- c("programming", "in", "R") 
fun2 &lt;- c("Have", "fun")
fun3 &lt;- c(fun2, fun)
fun3&lt;/pre&gt;
Inspect the code chunk above. What is the content of the variable &lt;code&gt;fun3&lt;/code&gt; in the last step?
&lt;ul&gt;&lt;li&gt;&lt;code&gt;&amp;quot;programming&amp;quot;&lt;/code&gt; &lt;code&gt;&amp;quot;in&amp;quot;&lt;/code&gt; &lt;code&gt;&amp;quot;R&amp;quot;&lt;/code&gt; &lt;code&gt;&amp;quot;Have&amp;quot;&lt;/code&gt; &lt;code&gt;&amp;quot;fun&amp;quot;&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;&amp;quot;Have&amp;quot;&lt;/code&gt; &lt;code&gt;&amp;quot;fun&amp;quot;&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;&amp;quot;Have&amp;quot;&lt;/code&gt; &lt;code&gt;&amp;quot;fun&amp;quot;&lt;/code&gt; &lt;code&gt;&amp;quot;programming&amp;quot;&lt;/code&gt; &lt;code&gt;&amp;quot;in&amp;quot;&lt;/code&gt; &lt;code&gt;&amp;quot;R&amp;quot;&lt;/code&gt;&lt;/li&gt;&lt;li&gt;There is no output, only an error message.&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/03-assignments/quiz-2"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Naming rules&lt;/h2&gt;
&lt;p&gt;There are a few rules we need to consider when creating variables.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Variable rules&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Can contain letters: &lt;span style="color:green"&gt;&lt;code&gt;example&lt;/code&gt;&lt;/span&gt;
&lt;/li&gt;
&lt;li&gt;Can contain numbers: &lt;span style="color:green"&gt;&lt;code&gt;example1&lt;/code&gt;&lt;/span&gt;
&lt;/li&gt;
&lt;li&gt;Can contain underscores: &lt;span style="color:green"&gt;&lt;code&gt;example_1&lt;/code&gt;&lt;/span&gt;
&lt;/li&gt;
&lt;li&gt;Can contain dots: &lt;span style="color:green"&gt;&lt;code&gt;example.1&lt;/code&gt;&lt;/span&gt; &lt;/li&gt;
&lt;li&gt;Cannot start with numbers: &lt;span style="color:red"&gt;&lt;code&gt;2example&lt;/code&gt;&lt;/span&gt;
&lt;/li&gt;
&lt;li&gt;Cannot start with underscores: &lt;span style="color:red"&gt;&lt;code&gt;_example&lt;/code&gt;&lt;/span&gt;
&lt;/li&gt;
&lt;li&gt;Cannot start with a dot if directly followed by a number: &lt;span style="color:red"&gt;&lt;code&gt;.2example&lt;/code&gt;&lt;/span&gt; &lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Quiz: Naming Rules&lt;/h2&gt;
Which of the following variable names are valid?
&lt;ul&gt;&lt;li&gt;weekly+tasks&lt;/li&gt;&lt;li&gt;task2Do&lt;/li&gt;&lt;li&gt;24hour&lt;/li&gt;&lt;li&gt;.task&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/03-assignments/quiz-3"&gt;Start Quiz&lt;/a&gt;
&lt;p&gt;Create variables through assignments is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Combine values into a vector</title><link>https://www.quantargo.com/courses/course-r-introduction/01-basics/02-vectors</link><pubDate>Fri, 01 May 2020 08:24:40 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/01-basics/02-vectors</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/01-basics/02-vectors.png"&gt;
&lt;p&gt;R always creates lists of values—even when there is only one value in a list. These lists are called &lt;em&gt;vectors&lt;/em&gt; and they make working with data much easier.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Everything is a vector&lt;/li&gt;
&lt;li&gt;Get to know different data types in R&lt;/li&gt;
&lt;li&gt;Learn how to create vectors&lt;/li&gt;
&lt;li&gt;Use the &lt;code&gt;:&lt;/code&gt; operator to create numeric sequences&lt;/li&gt;
&lt;li&gt;Use the concatenate function &lt;code&gt;c()&lt;/code&gt; to create vectors of different data types &lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;1:100
c(1, 2, 3, 4)
c("abc", "def", "ghi")
c(TRUE, FALSE, TRUE)&lt;/pre&gt;
&lt;h2&gt;Introduction to Vectors&lt;/h2&gt;
&lt;p&gt;A vector is a collection of elements of the same kind and the most basic data structure in R. For example, a vector could hold the four numbers &lt;code&gt;1&lt;/code&gt;, &lt;code&gt;3&lt;/code&gt;, &lt;code&gt;2&lt;/code&gt; and &lt;code&gt;5&lt;/code&gt;. Another vector could be formed with the three text strings &lt;code&gt;"Welcome"&lt;/code&gt;, &lt;code&gt;"Hi"&lt;/code&gt; and &lt;code&gt;"Hello"&lt;/code&gt;. These different kinds of values (numbers, text) are called &lt;em&gt;data types&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;A single value is also treated as a vector - a vector with only one element in it. As we will see throughout the course, this concept makes R very special. We can manipulate vectors and its values through plenty of operations that are provided by R.&lt;/p&gt;
&lt;p&gt;One key advantage of vectors is that we can apply an operation (e.g. a multiplication) to all its values at once instead of going through each item individually. This is called &lt;em&gt;vectorization&lt;/em&gt;.&lt;/p&gt;
&lt;h2&gt;Types of vectors&lt;/h2&gt;
&lt;p&gt;Vectors can only hold elements of the same &lt;em&gt;data type&lt;/em&gt;. In this course we will work with the following three main data types:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Numeric&lt;/strong&gt; values are numbers. Although they can be further split into whole numbers (integers) and numbers with decimals (doubles), R automatically converts between these sub-types if needed. Therefore, we will collectively refer to them as just &lt;code&gt;numeric&lt;/code&gt; values.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Character&lt;/strong&gt; values contain textual content. These can be letters, symbols, spaces and numbers as well. They must be enclosed by quotation marks - either single quotes &lt;code&gt;'___'&lt;/code&gt; or double quotes &lt;code&gt;"___"&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Logical&lt;/strong&gt; values can either be &lt;code&gt;TRUE&lt;/code&gt; or &lt;code&gt;FALSE&lt;/code&gt;. They are also often referred to as &lt;em&gt;boolean&lt;/em&gt; or &lt;em&gt;binary&lt;/em&gt; values. Because a &lt;code&gt;logical&lt;/code&gt; value can only be &lt;code&gt;TRUE&lt;/code&gt; or &lt;code&gt;FALSE&lt;/code&gt; they are most often used to answer simple questions like “Is 1 greater than 2?” or “Is it past 3 o’clock?”. These kind of questions only need answers like “Yes” (&lt;code&gt;TRUE&lt;/code&gt;) or “No” (&lt;code&gt;FALSE&lt;/code&gt;). Importantly, in R &lt;code&gt;logical&lt;/code&gt; values are case sensitive, which means they have to be written with capital letters.&lt;/p&gt;
&lt;h2&gt;Quiz: Data Types&lt;/h2&gt;
Which of the following options are valid data types in R?
&lt;ul&gt;&lt;li&gt;Numeric&lt;/li&gt;&lt;li&gt;Bytes&lt;/li&gt;&lt;li&gt;Logical&lt;/li&gt;&lt;li&gt;Simples&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/02-vectors/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Creating a sequence of numbers&lt;/h2&gt;
&lt;pre&gt;1:100
c(1, 2, 3, 4)
c("abc", "def", "ghi")
c(TRUE, FALSE, TRUE)&lt;/pre&gt;
&lt;p&gt;In R, even a single value is considered a vector. Creating a vector of one element is as simple as typing its value:&lt;/p&gt;
&lt;pre&gt;4&lt;/pre&gt;
&lt;pre&gt;[1] 4&lt;/pre&gt;
&lt;p&gt;To create a sequence of numeric values we can use the &lt;code&gt;:&lt;/code&gt; operator, which takes two numbers and outputs a vector of all whole numbers in that range:&lt;/p&gt;
&lt;pre&gt;2:11&lt;/pre&gt;
&lt;pre&gt; [1]  2  3  4  5  6  7  8  9 10 11&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;:&lt;/code&gt; operator creates a vector from the number on the left-hand side to the number on the right-hand side. Therefore, the order of numbers is important. If we define the previous example the other way around, we get a vector of descending numbers, instead of ascending:&lt;/p&gt;
&lt;pre&gt;11:2&lt;/pre&gt;
&lt;pre&gt; [1] 11 10  9  8  7  6  5  4  3  2&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;:&lt;/code&gt; operator comes handy when we need a vector of every whole number in a given range. However, if we need a vector where the numbers aren’t linear, we require something different.&lt;/p&gt;
&lt;h2&gt;Exercise: Use the : operator&lt;/h2&gt;
&lt;p&gt;Use the &lt;code&gt;:&lt;/code&gt; operator and create a vector from 2 to 6&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/02-vectors/exercise-01-02-01"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Concatenating numeric values to a vector&lt;/h2&gt;
&lt;pre&gt;1:100
c(1, 2, 3, 4)
c("abc", "def", "ghi")
c(TRUE, FALSE, TRUE)&lt;/pre&gt;
&lt;p&gt;We can combine multiple numbers into a single vector using the concatenate function &lt;code&gt;c()&lt;/code&gt; which links elements between the round braces together into a chain. Multiple elements need to be separated by commas.&lt;/p&gt;
&lt;p&gt;To create our first vector holding seven different numbers we can use the concatenate function &lt;code&gt;c()&lt;/code&gt; like so:&lt;/p&gt;
&lt;pre&gt;c(7, 4, 2, 5, 5, 22, 1)&lt;/pre&gt;
&lt;pre&gt;[1]  7  4  2  5  5 22  1&lt;/pre&gt;
&lt;p&gt;Note, that the “&lt;code&gt;[1]&lt;/code&gt;” sign before the output above is added by R, and is always added automatically when printing out vectors. If your vectors become bigger you will see more of these prefixes. Just know that they are only added for informational purposes by R, and that they are there to help you while coding. They are not part of the vector itself.&lt;/p&gt;
&lt;p&gt;You can see this more clearly, when the output spans over multiple lines:&lt;/p&gt;
&lt;pre&gt;1:60&lt;/pre&gt;
&lt;pre&gt; [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21
[22] 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
[43] 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60&lt;/pre&gt;
&lt;h2&gt;Exercise: Concatenate numbers&lt;/h2&gt;
&lt;p&gt;Use the concatenate function &lt;code&gt;c()&lt;/code&gt; and create a vector containing the numbers 2, 3, 6 and 7&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/02-vectors/exercise-01-02-02"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Creating character vectors&lt;/h2&gt;
&lt;pre&gt;1:100
c(1, 2, 3, 4)
c("abc", "def", "ghi")
c(TRUE, FALSE, TRUE)&lt;/pre&gt;
&lt;p&gt;To create a character vector of one element, all we need to do is to type out the text. Remember that we need to use quotation marks (&lt;code&gt;" "&lt;/code&gt;) around character values:&lt;/p&gt;
&lt;pre&gt;"golden retriever"&lt;/pre&gt;
&lt;pre&gt;[1] "golden retriever"&lt;/pre&gt;
&lt;p&gt;To create a character vector of multiple elements, we can again use the concatenate function &lt;code&gt;c()&lt;/code&gt;. This time we will use it with characters instead of numbers:&lt;/p&gt;
&lt;pre&gt;c("golden retriever", "labrador is a family dog", "beagle")&lt;/pre&gt;
&lt;pre&gt;[1] "golden retriever"         "labrador is a family dog"
[3] "beagle"                  &lt;/pre&gt;
&lt;h2&gt;Exercise: Create a character vector&lt;/h2&gt;
&lt;p&gt;Create a character vector with the single element: &lt;code&gt;"R is awesome!"&lt;/code&gt;&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/02-vectors/exercise-01-02-03"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Exercise: Concatenate text&lt;/h2&gt;
&lt;p&gt;Use the concatenate function &lt;code&gt;c()&lt;/code&gt; and create a vector containing four elements:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;
&lt;code&gt;"wombat"&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"fennec fox"&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"bearded dragon"&lt;/code&gt; and&lt;/li&gt;
&lt;li&gt;&lt;code&gt;"tasmanian devil"&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/02-vectors/exercise-01-02-04"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Creating logical vectors&lt;/h2&gt;
&lt;pre&gt;1:100
c(1, 2, 3, 4)
c("abc", "def", "ghi")
c(TRUE, FALSE, TRUE)&lt;/pre&gt;
&lt;p&gt;Logical vectors can only hold the values &lt;code&gt;TRUE&lt;/code&gt; and &lt;code&gt;FALSE&lt;/code&gt;. To create a logical vector with a single value, type out one of the valid values &lt;code&gt;TRUE&lt;/code&gt; or &lt;code&gt;FALSE&lt;/code&gt;. Remember that they must be written with capital letters:&lt;/p&gt;
&lt;pre&gt;TRUE&lt;/pre&gt;
&lt;pre&gt;[1] TRUE&lt;/pre&gt;
&lt;p&gt;Similarly to other types of vectors, we can use the concatenate function &lt;code&gt;c()&lt;/code&gt; to create a logical vector of multiple elements:&lt;/p&gt;
&lt;pre&gt;c(TRUE, FALSE, TRUE, FALSE, TRUE)&lt;/pre&gt;
&lt;pre&gt;[1]  TRUE FALSE  TRUE FALSE  TRUE&lt;/pre&gt;
&lt;h2&gt;Exercise: Concatenate logical values&lt;/h2&gt;
&lt;p&gt;Use the concatenate function &lt;code&gt;c()&lt;/code&gt; and create a vector containing the three elements: &lt;code&gt;TRUE&lt;/code&gt;, &lt;code&gt;FALSE&lt;/code&gt; and &lt;code&gt;TRUE&lt;/code&gt;&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/02-vectors/exercise-01-02-05"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Quiz: Vectors Recap&lt;/h2&gt;
Which of the following statements about vectors are correct?
&lt;ul&gt;&lt;li&gt;In R a single value is a vector as well&lt;/li&gt;&lt;li&gt;A vector can contain numbers and characters simultaneously&lt;/li&gt;&lt;li&gt;Elements of a character vector must be enclosed by quotation marks&lt;/li&gt;&lt;li&gt;&lt;code&gt;TRUE&lt;/code&gt; and &lt;code&gt;true&lt;/code&gt; are both the same logical value&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/02-vectors/quiz-2"&gt;Start Quiz&lt;/a&gt;
&lt;p&gt;Combine values into a vector is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>R is everywhere</title><link>https://www.quantargo.com/courses/course-r-introduction/01-basics/01-r-is-everywhere</link><pubDate>Mon, 27 Apr 2020 08:07:34 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/01-basics/01-r-is-everywhere</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/01-basics/01-r-is-everywhere.png"&gt;
&lt;p&gt;R is widely popular and incredibly useful for people working as Data Scientists or in companies. But you can also use R for more simple things, like creating a nice chart or making a quick calculation. Getting started is pretty straight forward, too.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Learn what R is all about&lt;/li&gt;
&lt;li&gt;Get an overview of why R is useful&lt;/li&gt;
&lt;li&gt;Submit your first code exercise&lt;/li&gt;
&lt;/ul&gt;

&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/01-basics/images/r_user_groups.png"&gt;
&lt;h2&gt;Introduction to R&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The most powerful statistical computing language on the planet.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Norman Nie, Founder of SPSS&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;R is a programming language and environment to work with data. It is loved by statisticians and data scientists for its expressive code syntax and plentiful external libraries and tools and works on all major operating systems.&lt;/p&gt;
&lt;p&gt;It is &lt;em&gt;the&lt;/em&gt; Swiss army knife for data analysis and statistical computing (and you can make some pretty charts, too!). The R language is easily extensible with packages written by a large and growing community of developers around the world. You can find it pretty much anywhere—it is used by academic institutions, start-ups, international corporations and many more.&lt;/p&gt;
&lt;p&gt;This is also reflected by looking at its adoption. Here we can see a large increase in both downloads and number of packages available over the years:&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/01-basics/01-r-is-everywhere_files/figure-html/unnamed-chunk-2-1.png"&gt;
&lt;p&gt;In 2020 R celebrates its 20th birthday with the release of version 4.0. And yes, it’s free and open source 😀&lt;/p&gt;
&lt;h2&gt;Quiz: R Facts&lt;/h2&gt;
Which of the following statements about R are correct?
&lt;ul&gt;&lt;li&gt;R only works on the Linux operating system.&lt;/li&gt;&lt;li&gt;R cannot be used in corporate environments.&lt;/li&gt;&lt;li&gt;R is a programming language geared towards data analysis.&lt;/li&gt;&lt;li&gt;R is extensible through packages developed by the community.&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/01-r-is-everywhere/quiz-1"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;Why Use R?&lt;/h2&gt;
&lt;p&gt;R is a popular language for solving data analysis problems and is also used by people who traditionally do not consider themselves as programmers. When creating charts and visualizations with R, you will find that you have a much greater creative possibilities as opposed to graphical applications, such as Excel.&lt;/p&gt;
&lt;p&gt;Here are some of the &lt;strong&gt;features&lt;/strong&gt; R is most famous for:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Visualization&lt;/strong&gt;: Creating beautiful graphs and visualizations is one of its biggest strengths. The core language already provides a rich set of tools used for plotting charts and for all kinds of graphics. The sky’s the limit.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Reproducibility&lt;/strong&gt;: Unlike spreadsheet software, R code is not coupled to specific datasets and can easily be reused across different projects - even when exceeding more than 1 million rows. Easily build reusable reports and automatically generate new versions as the data changes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Advanced modelling&lt;/strong&gt;: R provides the biggest and most powerful code base for data analysis in the world. The richness and depth of available statistical models is unparalleled and growing by the day, thanks to the huge community of open source package developers and contributors.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Automation&lt;/strong&gt;: R code can also be used to automate reports or to perform data transformations and model computations. It can also be integrated in automated production workflows, cloud computing environments and modern database systems.&lt;/p&gt;
&lt;h2&gt;Quiz: Using R&lt;/h2&gt;
What are the main reasons to use R compared to spreadsheet software?
&lt;ul&gt;&lt;li&gt;Easy to reproduce results&lt;/li&gt;&lt;li&gt;Use huge datasets with more than 1 million rows&lt;/li&gt;&lt;li&gt;Support for advanced modelling techniques including Machine Learning&lt;/li&gt;&lt;li&gt;Create beautiful visualizations&lt;/li&gt;&lt;/ul&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/01-r-is-everywhere/quiz-2"&gt;Start Quiz&lt;/a&gt;
&lt;h2&gt;You R in Good Company&lt;/h2&gt;
&lt;p&gt;R is the de facto standard for statistical computing at academic institutions and companies around the world. Its great support for &lt;em&gt;literate programming&lt;/em&gt; (code that can be combined with human-readable text) enables researchers and data scientists to create publication-ready reports which are easy to reproduce for reviewers.&lt;/p&gt;
&lt;p&gt;The language has seen a wide adoption in various industries—see some examples below:&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/01-basics/images/you-r-in-good-company.png"&gt;
&lt;p&gt;&lt;strong&gt;Information Technology&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Microsoft: &lt;a href="https://mran.microsoft.com/open"&gt;Microsoft R Open&lt;/a&gt;, &lt;a href="https://www.microsoft.com/en-us/research/publication/trueskilltm-a-bayesian-skill-rating-system"&gt;TrueSkill(TM)&lt;/a&gt;, more &lt;a href="https://blog.revolutionanalytics.com/2018/02/what-does-microsoft-do-with-r.html"&gt;here&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Google: &lt;a href="https://research.google/pubs/pub43342/"&gt;R for Marketing Research and Analytics&lt;/a&gt;, &lt;a href="https://static.googleusercontent.com/media/www.google.com/fr//googleblogs/pdfs/google_predicting_the_present.pdf"&gt;Predicting the Present with Google Trends&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Facebook: &lt;a href="https://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919"&gt;Visualizing Friendships&lt;/a&gt;, &lt;a href="https://www.facebook.com/notes/facebook-data-science/the-formation-of-love/10152064609253859"&gt;The Formation of Love&lt;/a&gt;, &lt;a href="https://facebook.github.io/prophet"&gt;Prophet Package&lt;/a&gt; for time series forecasting.&lt;/li&gt;
&lt;li&gt;Others (with links to projects): &lt;a href="https://peerj.com/preprints/3182.pdf"&gt;AirBnB&lt;/a&gt;, &lt;a href="https://capetown2017.satrdays.org/talks/satRday-2017-van-heerden.pdf"&gt;Uber&lt;/a&gt;, &lt;a href="https://www.oracle.com/database/technologies/datawarehouse-bigdata/oml4r.html"&gt;Oracle&lt;/a&gt;, IBM, Twitter,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pharma&lt;/strong&gt;: Merck, Genentech (Roche), Novartis, Pfizer&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Newspapers&lt;/strong&gt;: The Economist, The New York Times, Financial Times&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Finance&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Banks: Bank of America, J.P.Morgan, Goldman Sachs, Credit Suisse, UBS, Deutsche Bank&lt;/li&gt;
&lt;li&gt;Insurances: Lloyd’s, Allianz&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See also the &lt;a href="https://www.r-consortium.org"&gt;R Consortium page&lt;/a&gt; for further information about industrial partners and initiatives.&lt;/p&gt;
&lt;h2&gt;Building Blocks&lt;/h2&gt;
&lt;p&gt;In the next chapters we will have a look at the most important features and concepts:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;&lt;strong&gt;Vectors&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Variables&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Operators&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Functions&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Packages&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So, let’s write your fist code in R!&lt;/p&gt;
&lt;h2&gt;Exercise: Submit your first code&lt;/h2&gt;
&lt;p&gt;This course has code exercises to help you learn and quickly explore new concepts. After entering code in the editor, hit the “&lt;strong&gt;Submit&lt;/strong&gt;” button to execute it. The editor will give you feedback on your submission and displays any output below the editor. If you need some additional help use the “&lt;strong&gt;Get Hint&lt;/strong&gt;” button.&lt;/p&gt;
&lt;p&gt;To finish your first exercise, press the “&lt;strong&gt;Submit&lt;/strong&gt;” button.&lt;/p&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/01-basics/01-r-is-everywhere/exercise-01-01-03"&gt;Start Exercise&lt;/a&gt;
&lt;p&gt;R is everywhere is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>Launch of New Course Platform</title><link>https://www.quantargo.com/blog/post/2020-04-21-new-course-platform-launch</link><pubDate>Tue, 21 Apr 2020 00:00:00 +0000</pubDate><guid>https://www.quantargo.com/blog/post/2020-04-21-new-course-platform-launch</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Launch of New Course Platform&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2020-04-21-new-course-platform-launch/Course_1.svg.png"&gt;
&lt;p&gt;After months of hard work we are really excited to launch our brand-new course platform to learn and apply data science. Together with the new platform we also developed the first new course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt; which is available for free now!&lt;/p&gt;
&lt;p&gt;This release is a big milestone for us on our path to provide people with the best knowledge and tools available to apply data science. We think that programming—and data science in particular—should be taught interactively by seeing and writing real code. Our online course platform is built on an exhaustive amount of interactive coding exercises and quizzes.&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2020-04-21-new-course-platform-launch/exercise_demo.gif"&gt;
&lt;p&gt;But the technical achievement is not the only novelty. We diverged from a traditional course outline and completely changed how we structure our course content. Each chapter features a so-called &lt;strong&gt;recipe&lt;/strong&gt; which learners can collect by finishing exercises. Recipes depend on each other and together form a knowledge graph. In the future, learners will be able to create their own learning paths based on the dependency structure of the graph and their progress. Collected recipes are available in your &lt;strong&gt;cookbook&lt;/strong&gt; which gives you an overview of your progress.&lt;/p&gt;
&lt;p&gt;Another cool feature is the &lt;strong&gt;achievement system&lt;/strong&gt;. Code recipes can be collected in a &lt;em&gt;cookbook&lt;/em&gt; so that learners can review their achievements. Once all recipes from a topic have been collected users get a &lt;em&gt;badge&lt;/em&gt;. The course is finally finished when all the badges have been collected.&lt;/p&gt;
&lt;div id="new-course-available-for-free-introduction-to-r" class="section level3"&gt;
&lt;h3&gt;New Course Available for Free: Introduction to R&lt;/h3&gt;
&lt;p&gt;The first course module &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt; is perfect for newcomers who want to get started with data science. The course teaches the programming language R and covers the &lt;strong&gt;language basics&lt;/strong&gt; so that you can transform data and &lt;strong&gt;make professional looking graphs and charts with little effort&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;OPEN COURSE&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We would love to hear your feedback - either through the feedback buttons on each page (visible for logged-in users) or via e-mail:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Course Content: &lt;a href="mailto:courses@quantargo.com"&gt;courses@quantargo.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Technical Issues: &lt;a href="mailto:support@quantargo.com"&gt;support@quantargo.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Cheers,&lt;/p&gt;
&lt;p&gt;Your Quantargo Team&lt;/p&gt;
&lt;/div&gt;</description></item><item><title>Create your first bar chart</title><link>https://www.quantargo.com/courses/course-r-introduction/04-ggplot/05-bar-charts</link><pubDate>Tue, 11 Feb 2020 08:30:00 +0000</pubDate><guid>https://www.quantargo.com/courses/course-r-introduction/04-ggplot/05-bar-charts</guid><category>R</category><category>Course</category><category>Introduction</category><category>Basics</category><category>Interactive</category><description>&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/05-bar-charts.png"&gt;
&lt;ul&gt;
&lt;li&gt;Create your first bar chart using &lt;code&gt;geom_col()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Fill bars with color using the &lt;code&gt;fill&lt;/code&gt; aesthetic&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;ggplot(___) + 
  geom_col(
    mapping = aes(x = ___, y = ___, 
                  fill = ___)
 )&lt;/pre&gt;
&lt;h2&gt;Introduction to bar charts&lt;/h2&gt;
&lt;p&gt;Bar charts visualize &lt;code&gt;numeric&lt;/code&gt; values grouped by categories. Each category is represented by one bar with a height defined by each &lt;code&gt;numeric&lt;/code&gt; value.&lt;/p&gt;
&lt;p&gt;Bar charts are well suited to compare values among different groups e.g. number of votes by parties, number of people in different countries or GDP per capita in different countries. Bar charts are a bit spacious and work best if the number of groups to compare is rather small.&lt;/p&gt;
&lt;p&gt;Below you can find an example showing the number of people (in millions) in the five biggest countries by population in 2007:&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/05-bar-charts_files/figure-html/unnamed-chunk-2-1.png"&gt;
&lt;h2&gt;Creating a simple bar chart&lt;/h2&gt;
&lt;pre&gt;ggplot(___) + 
  geom_col(
    mapping = aes(x = ___, y = ___, 
                  fill = ___)
 )&lt;/pre&gt;
&lt;p&gt;In &lt;strong&gt;ggplot2&lt;/strong&gt;, bar charts are created using the &lt;code&gt;geom_col()&lt;/code&gt; geometric layer. The &lt;code&gt;geom_col()&lt;/code&gt; layer requires the &lt;code&gt;x&lt;/code&gt; aesthetic mapping which defines the different bars to be plotted. The height of each bar is defined by the variable specified in the &lt;code&gt;y&lt;/code&gt; aesthetic mapping. Both mappings, &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt; are required for &lt;code&gt;geom_col()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Let’s create our first bar chart with the &lt;code&gt;gapminder_top5&lt;/code&gt; dataset. It contains population (in millions) and life expectancy data for the biggest countries by population in 2007.&lt;/p&gt;
&lt;pre&gt;ggplot(gapminder_top5) + 
  geom_col(aes(x = country, y = pop))&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/05-bar-charts_files/figure-html/unnamed-chunk-4-1.png"&gt;
&lt;p&gt;We see that the resulting bars are sorted by the country names in alphabetical order by default.&lt;/p&gt;
&lt;h2&gt;Exercise: Plot life expectancy by country&lt;/h2&gt;
&lt;p&gt;Create a bar chart showing the life expectancy of the five biggest countries by population in 2007.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the &lt;code&gt;ggplot()&lt;/code&gt; function and specify the &lt;code&gt;gapminder_top5&lt;/code&gt; dataset as input&lt;/li&gt;
&lt;li&gt;Add a &lt;code&gt;geom_col()&lt;/code&gt; layer to the plot&lt;/li&gt;
&lt;li&gt;Plot one bar for each &lt;code&gt;country&lt;/code&gt; (x aesthetic)&lt;/li&gt;
&lt;li&gt;Use life expectancy &lt;code&gt;lifeExp&lt;/code&gt; as bar height (y aesthetic)&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/05-bar-charts/exercise-05-01"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Filling bars with color&lt;/h2&gt;
&lt;pre&gt;ggplot(___) + 
  geom_col(
    mapping = aes(x = ___, y = ___, 
                  fill = ___)
 )&lt;/pre&gt;
&lt;p&gt;Like other geoms &lt;code&gt;geom_col()&lt;/code&gt; allows users to map additional dataset variables to the color attribute of the bar. The &lt;code&gt;fill&lt;/code&gt; aesthetic can be used to fill the entire bars with color. A usual confusion is the &lt;code&gt;color&lt;/code&gt; aesthetic which specifies the &lt;em&gt;line&lt;/em&gt; color of each bar’s border instead of the &lt;em&gt;fill&lt;/em&gt; color.&lt;/p&gt;
&lt;p&gt;Based on the &lt;code&gt;gapminder_top5&lt;/code&gt; dataset we plot the population (in millions) of the biggest countries and use the &lt;code&gt;continent&lt;/code&gt; variable to color each bar:&lt;/p&gt;
&lt;pre&gt;ggplot(gapminder_top5) + 
  geom_col(aes(x = country, y = pop, fill = continent))&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/05-bar-charts_files/figure-html/unnamed-chunk-6-1.png"&gt;
&lt;p&gt;Since the &lt;code&gt;continent&lt;/code&gt; variable is a categorical variable the bars have a clear color scheme for each continent. Let’s see what happens if we use a &lt;code&gt;numeric&lt;/code&gt; variable like life expectancy &lt;code&gt;lifeExp&lt;/code&gt; instead:&lt;/p&gt;
&lt;pre&gt;ggplot(gapminder_top5) + 
  geom_col(aes(x = country, y = pop, fill = lifeExp))&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/05-bar-charts_files/figure-html/unnamed-chunk-7-1.png"&gt;
&lt;p&gt;The bar colors have now changed according the &lt;strong&gt;continuous&lt;/strong&gt; legend on the right. We see that also &lt;code&gt;numeric&lt;/code&gt; variables can be used to &lt;code&gt;fill&lt;/code&gt; bars.&lt;/p&gt;
&lt;h2&gt;Exercise: Plot population size by country&lt;/h2&gt;
&lt;p&gt;Create a bar chart showing the population (in millions) of the five biggest countries by population in 2007.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the &lt;code&gt;ggplot()&lt;/code&gt; function and specify the &lt;code&gt;gapminder_top5&lt;/code&gt; dataset as input&lt;/li&gt;
&lt;li&gt;Add a &lt;code&gt;geom_col()&lt;/code&gt; layer to the plot&lt;/li&gt;
&lt;li&gt;Plot one bar for each &lt;code&gt;country&lt;/code&gt; (x aesthetic)&lt;/li&gt;
&lt;li&gt;Use population &lt;code&gt;pop&lt;/code&gt; as bar height (y aesthetic)&lt;/li&gt;
&lt;li&gt;Use the GDP per capita &lt;code&gt;gdpPercap&lt;/code&gt; as &lt;code&gt;fill&lt;/code&gt; aesthetic&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/05-bar-charts/exercise-05-02"&gt;Start Exercise&lt;/a&gt;
&lt;h2&gt;Stacked bar charts&lt;/h2&gt;
&lt;pre&gt;ggplot(___) + 
  geom_col(
    mapping = aes(x = ___, y = ___, 
                  fill = ___)
 )&lt;/pre&gt;
&lt;p&gt;In some circumstances it might be useful to plot multiple numeric values variables within each bar. Examples are numeric values describing one specific entity (e.g. customers) split among various categories (customer segments) so that the bar height represents the total number (all customers).&lt;/p&gt;
&lt;p&gt;The plot below shows the number of phones (in thousands) by continent from 1956 to 1961 as a stacked bar chart:&lt;/p&gt;
&lt;pre&gt;ggplot(world_phones) + 
  geom_col(aes(x = year, y = phones,
               fill = region))&lt;/pre&gt;
&lt;img src="https://www.quantargo.com/assets/courses/course-r-introduction/04-ggplot/05-bar-charts_files/figure-html/unnamed-chunk-10-1.png"&gt;
&lt;h2&gt;Exercise: Plot number of crimes by US states&lt;/h2&gt;
&lt;p&gt;Create a bar chart showing the number of crimes by US state per 100,000 residents in 1973.&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Use the &lt;code&gt;ggplot()&lt;/code&gt; function and specify the &lt;code&gt;us_arrests&lt;/code&gt; dataset as input&lt;/li&gt;
&lt;li&gt;Add a &lt;code&gt;geom_col()&lt;/code&gt; layer to the plot&lt;/li&gt;
&lt;li&gt;Plot one bar for each &lt;code&gt;state&lt;/code&gt; (x aesthetic)&lt;/li&gt;
&lt;li&gt;Use the number of &lt;code&gt;cases&lt;/code&gt; as bar height (y aesthetic)&lt;/li&gt;
&lt;li&gt;Use the &lt;code&gt;crime&lt;/code&gt; type as &lt;code&gt;fill&lt;/code&gt; aesthetic.&lt;/li&gt;
&lt;/ol&gt;
&lt;a href="https://www.quantargo.com/courses/course-r-introduction/04-ggplot/05-bar-charts/exercise-05-04"&gt;Start Exercise&lt;/a&gt;
&lt;p&gt;Create your first bar chart is an excerpt from the course &lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;Introduction to R&lt;/a&gt;, which is available for free at &lt;a href="https://www.quantargo.com"&gt;quantargo.com&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.quantargo.com/courses/course-r-introduction"&gt;VIEW FULL COURSE&lt;/a&gt;</description></item><item><title>ViennaR Meetup March - Full Talks Online</title><link>https://www.quantargo.com/blog/post/2019-04-29-viennar-meetup-march-full-talks-online</link><pubDate>Mon, 29 Apr 2019 00:00:00 +0000</pubDate><guid>https://www.quantargo.com/blog/post/2019-04-29-viennar-meetup-march-full-talks-online</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-04-29-viennar-meetup-march-full-talks-online/ViennaR_Hadley_Laura.png"&gt;
&lt;p&gt;The full talks of the ViennaR March Meetup are finally online: A short Introduction to ViennaR, Laura Vana introducing &lt;a href="https://www.meetup.com/rladies-vienna"&gt;R-Ladies Vienna&lt;/a&gt; and Hadley Wickham with a great introduction to tidy(er) data and the new functions &lt;code&gt;pivot_wider()&lt;/code&gt; and &lt;code&gt;pivot_longer()&lt;/code&gt;. Stay tuned for the next &lt;a href="https://www.meetup.com/ViennaR"&gt;ViennaR Meetups&lt;/a&gt;!&lt;/p&gt;
&lt;iframe width="560" height="315" src="https://www.youtube.com/embed/QNyjpxWFVT8" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen&gt;
&lt;/iframe&gt;
&lt;p&gt;You can download the slides of the introduction &lt;a href="ViennaR-Meetup-Intro.pdf"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;div id="ronald-hochreiter-mario-annau-walter-djuric" class="section level5"&gt;
&lt;h5&gt;
&lt;a href="https://www.linkedin.com/in/ronaldhochreiter/?originalSubdomain=at"&gt;Ronald Hochreiter&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/mario-annau-a68b2056/"&gt;Mario Annau&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/walterdjuric"&gt;Walter Djuric&lt;/a&gt;
&lt;/h5&gt;
&lt;/div&gt;
&lt;h2&gt;Laura Vana: R-Ladies&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-04-29-viennar-meetup-march-full-talks-online/ViennaR_Hadley_Laura.png"&gt;
&lt;iframe width="560" height="315" src="https://www.youtube.com/embed/SeMP8z-2ki4" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen&gt;
&lt;/iframe&gt;
&lt;p&gt;Laura introduced the &lt;a href="https://www.meetup.com/rladies-vienna"&gt;R-Ladies Vienna&lt;/a&gt; - a program initiated by the R-Consortium to&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;achieve proportionate representation by encouraging, inspiring, and empowering the minorities currently underrepresented in the R community. R-Ladies’ primary focus, therefore, is on supporting the R enthusiasts who identify as an underrepresented minority to achieve their programming potential, by building a collaborative global network of R leaders, mentors, learners, and developers to facilitate individual and collective progress worldwide.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;from &lt;a href="https://rladies.org" class="uri"&gt;https://rladies.org&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Visit the &lt;a href="https://www.meetup.com/rladies-vienna"&gt;Meetup site&lt;/a&gt; to find out more about upcoming R-Ladies events including workshops covering an &lt;a href="https://www.meetup.com/rladies-vienna/events/260340072"&gt;R Introduction&lt;/a&gt; and &lt;a href="https://www.meetup.com/rladies-vienna/events/260534779"&gt;Bayesian Statistics&lt;/a&gt;.&lt;/p&gt;
&lt;div id="laura-vana" class="section level5"&gt;
&lt;h5&gt;&lt;a href="https://www.linkedin.com/in/laura-vana-4b92b940/"&gt;Laura Vana&lt;/a&gt;&lt;/h5&gt;
&lt;/div&gt;
&lt;h2&gt;Hadley Wickham: Tidy(er) Data&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-04-29-viennar-meetup-march-full-talks-online/ViennaR_Hadley_Laura.png"&gt;
&lt;iframe width="560" height="315" src="https://www.youtube.com/embed/D48JHU4llkk" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen&gt;
&lt;/iframe&gt;
&lt;p&gt;Hadley Wickham’s talk covered the &lt;a href="https://tidyr.tidyverse.org"&gt;&lt;strong&gt;tidyr&lt;/strong&gt;&lt;/a&gt; package and two new functions: &lt;code&gt;pivot_wide()&lt;/code&gt; and &lt;code&gt;pivot_long()&lt;/code&gt; which have finally been renamed to &lt;code&gt;pivot_wider()&lt;/code&gt; and &lt;code&gt;pivot_longer()&lt;/code&gt; as a result of a survey, see also the posts on &lt;a href="https://twitter.com/hadleywickham/status/1109816130774986753"&gt;Twitter&lt;/a&gt; and &lt;a href="https://github.com/hadley/table-shapes"&gt;Github&lt;/a&gt;. These function should replace &lt;code&gt;gather()&lt;/code&gt; and &lt;code&gt;spread()&lt;/code&gt; since they seem to be hard-to-remember for most users coming to &lt;strong&gt;tidyr&lt;/strong&gt;.&lt;/p&gt;
&lt;div id="hadley-wickham" class="section level5"&gt;
&lt;h5&gt;&lt;a href="https://www.linkedin.com/in/hadleywickham/"&gt;Hadley Wickham&lt;/a&gt;&lt;/h5&gt;
&lt;p&gt;See you at one of our next Meetup/Course/Get-Together!&lt;/p&gt;
&lt;p&gt;Cheers,&lt;/p&gt;
&lt;p&gt;Your Quantargo Team&lt;/p&gt;
&lt;/div&gt;</description></item><item><title>ViennaR Meetup March - Impressions</title><link>https://www.quantargo.com/blog/post/2019-04-11-viennar-meetup-march-impressions</link><pubDate>Thu, 11 Apr 2019 00:00:00 +0000</pubDate><guid>https://www.quantargo.com/blog/post/2019-04-11-viennar-meetup-march-impressions</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-04-11-viennar-meetup-march-impressions/ViennaR Meetup Intro.svg"&gt;
&lt;p&gt;For all who couldn’t make it to our last &lt;a href="https://www.meetup.com/ViennaR"&gt;ViennaR Meetup&lt;/a&gt; on March 18, 2019 at Webster Vienna Private University here just a short summary of the talks and takeaways.&lt;/p&gt;
&lt;p&gt;The Introduction covered a short history of the &lt;a href="https://www.meetup.com/ViennaR"&gt;ViennaR Meetup&lt;/a&gt; and the special relationship between R and Vienna through the &lt;a href="https://www.r-project.org/foundation"&gt;R-foundation&lt;/a&gt;, 2 &lt;a href="https://www.r-project.org/contributors.html"&gt;R-core members&lt;/a&gt; (&lt;a href="https://www.wu.ac.at/statmath/faculty-staff/faculty/khornik/"&gt;Kurt Hornik&lt;/a&gt; and &lt;a href="https://boku.ac.at/en/rali/iasc/personen/friedrich-leisch"&gt;Fritz Leisch&lt;/a&gt;) and one of the first organized &lt;a href="http://www.ci.tuwien.ac.at/conferences.html"&gt;R conferences&lt;/a&gt; - the &lt;a href="http://www.ci.tuwien.ac.at/Conferences/DSC-1999"&gt;DSC 1999&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can download the slides of the introduction &lt;a href="ViennaR-Meetup-Intro.pdf"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;div id="ronald-hochreiter-mario-annau-walter-djuric" class="section level5"&gt;
&lt;h5&gt;
&lt;a href="https://www.linkedin.com/in/ronaldhochreiter/?originalSubdomain=at"&gt;Ronald Hochreiter&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/mario-annau-a68b2056/"&gt;Mario Annau&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/walterdjuric"&gt;Walter Djuric&lt;/a&gt;
&lt;/h5&gt;
&lt;/div&gt;
&lt;h2&gt;R-Ladies&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-04-11-viennar-meetup-march-impressions/ViennaR Meetup Intro.svg"&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-04-11-viennar-meetup-march-impressions/rladies-map.png"&gt;
&lt;p&gt;Laura Vana introduced the &lt;a href="https://www.meetup.com/rladies-vienna"&gt;R-Ladies Vienna&lt;/a&gt; - a program initiated by the R-Consortium to&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;achieve proportionate representation by encouraging, inspiring, and empowering the minorities currently underrepresented in the R community. R-Ladies’ primary focus, therefore, is on supporting the R enthusiasts who identify as an underrepresented minority to achieve their programming potential, by building a collaborative global network of R leaders, mentors, learners, and developers to facilitate individual and collective progress worldwide.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;from &lt;a href="https://rladies.org" class="uri"&gt;https://rladies.org&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Visit the &lt;a href="https://www.meetup.com/rladies-vienna"&gt;Meetup site&lt;/a&gt; to find out more about upcoming R-Ladies events including workshops covering an &lt;a href="https://www.meetup.com/rladies-vienna/events/260340072"&gt;R Introduction&lt;/a&gt; and &lt;a href="https://www.meetup.com/rladies-vienna/events/260534779"&gt;Bayesian Statistics&lt;/a&gt;.&lt;/p&gt;
&lt;div id="laura-vana" class="section level5"&gt;
&lt;h5&gt;&lt;a href="https://www.linkedin.com/in/laura-vana-4b92b940/"&gt;Laura Vana&lt;/a&gt;&lt;/h5&gt;
&lt;/div&gt;
&lt;h2&gt;Tidy(er) Data&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-04-11-viennar-meetup-march-impressions/ViennaR Meetup Intro.svg"&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-04-11-viennar-meetup-march-impressions/tidyr.svg"&gt;
&lt;p&gt;Hadley Wickham’s talk covered the &lt;a href="https://tidyr.tidyverse.org"&gt;&lt;strong&gt;tidyr&lt;/strong&gt;&lt;/a&gt; package and two new functions: &lt;code&gt;pivot_wide()&lt;/code&gt; and &lt;code&gt;pivot_long()&lt;/code&gt; which have finally been renamed to &lt;code&gt;pivot_wider()&lt;/code&gt; and &lt;code&gt;pivot_longer()&lt;/code&gt; as a result of a survey, see also the posts on &lt;a href="https://twitter.com/hadleywickham/status/1109816130774986753"&gt;Twitter&lt;/a&gt; and &lt;a href="https://github.com/hadley/table-shapes"&gt;Github&lt;/a&gt;. These function should replace &lt;code&gt;gather()&lt;/code&gt; and &lt;code&gt;spread()&lt;/code&gt; since they seem to be hard-to-remember for most users coming to &lt;strong&gt;tidyr&lt;/strong&gt;.&lt;/p&gt;
&lt;div id="hadley-wickham" class="section level5"&gt;
&lt;h5&gt;&lt;a href="https://www.linkedin.com/in/hadleywickham/"&gt;Hadley Wickham&lt;/a&gt;&lt;/h5&gt;
&lt;/div&gt;
&lt;h2&gt;First Impressions&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-04-11-viennar-meetup-march-impressions/ViennaR Meetup Intro.svg"&gt;
&lt;p&gt;Below you can find the first impressions of the Meetup on our new &lt;a href="https://www.youtube.com/channel/UCUckQKqk1l24RbemESLJPig/"&gt;Youtube channel&lt;/a&gt;. The full talks are still being cut and optimized - so stay tuned for the full talks from Laura and Hadley - to be released on our &lt;a href="https://www.youtube.com/channel/UCUckQKqk1l24RbemESLJPig/"&gt;Youtube Channel&lt;/a&gt; by next week - feel free to subscribe :-)&lt;/p&gt;
&lt;iframe width="560" height="315" src="https://www.youtube.com/embed/B4HsDBG2Xyk" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen&gt;
&lt;/iframe&gt;
&lt;p&gt;See you at one of our next Meetups/Courses/Get-Togethers!&lt;/p&gt;
&lt;p&gt;Cheers,&lt;/p&gt;
&lt;p&gt;Your Quantargo Team&lt;/p&gt;</description></item><item><title>ViennaR Meetup Announcement March 2019</title><link>https://www.quantargo.com/blog/post/2019-02-25-meetup-announcement</link><pubDate>Mon, 25 Feb 2019 00:00:00 +0000</pubDate><guid>https://www.quantargo.com/blog/post/2019-02-25-meetup-announcement</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;ViennaR Meetup Announcement March 2019&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-02-25-meetup-announcement/logo2.png"&gt;
&lt;p&gt;For the next ViennaR Meetup on &lt;strong&gt;March 18&lt;/strong&gt; we are excited to announce &lt;strong&gt;Laura Vana&lt;/strong&gt; (&lt;a href="https://www.meetup.com/rladies-vienna"&gt;R-Ladies&lt;/a&gt;) and &lt;strong&gt;Hadley Wickham&lt;/strong&gt; (&lt;a href="https://www.rstudio.com/"&gt;RStudio&lt;/a&gt;). The meetup will take place at Webster Vienna Private University (&lt;a href="http://webster.ac.at" class="uri"&gt;http://webster.ac.at&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Registration is required at the &lt;a href="https://www.meetup.com/ViennaR/events/259235903/"&gt;Meetup Page&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://www.meetup.com/ViennaR/events/259235903/" class="uri"&gt;https://www.meetup.com/ViennaR/events/259235903/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Feel free to join the networking session with food and drinks afterwards. See you at the Meetup and happy R coding!&lt;/p&gt;
&lt;iframe src="https://www.google.com/maps/embed?pb=!1m14!1m8!1m3!1d10634.445127525405!2d16.3829412!3d48.2141028!3m2!1i1024!2i768!4f13.1!3m3!1m2!1s0x0%3A0xb613b511f9dcd07c!2sWebster+Vienna+Private+University!5e0!3m2!1sde!2sat!4v1551175268617" width="600" height="450" frameborder="0" style="border:0" allowfullscreen&gt;
&lt;/iframe&gt;
&lt;h2&gt;Laura Vana: R-Ladies&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-02-25-meetup-announcement/logo2.png"&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-02-25-meetup-announcement/laura.jpg"&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-02-25-meetup-announcement/R-Ladies.png"&gt;
&lt;p&gt;&lt;a href="https://at.linkedin.com/in/laura-vana-4b92b940"&gt;Laura&lt;/a&gt; is post-doc researcher at WU Vienna and chapter lead of &lt;a href="https://www.meetup.com/rladies-vienna"&gt;R-Ladies Vienna&lt;/a&gt;. She will present ongoing initiatives and members of R-Ladies Vienna.&lt;/p&gt;
&lt;div id="about-r-ladies" class="section level3"&gt;
&lt;h3&gt;About R-Ladies&lt;/h3&gt;
&lt;p&gt;R-Ladies Vienna welcomes members of all R proficiency levels, whether you’re a new or aspiring R user, or an experienced R programmer interested in mentoring, networking &amp;amp; expert upskilling. Our community is designed to develop our members’ R skills &amp;amp; knowledge through social, collaborative learning &amp;amp; sharing. Supporting minority identity access to STEM skills &amp;amp; careers, the Free Software Movement, and contributing to the global R community! A local chapter of R-Ladies Global, R-Ladies Vienna exists to promote gender diversity in the R community worldwide. Please visit &lt;a href="https://www.meetup.com/rladies-vienna/" class="uri"&gt;https://www.meetup.com/rladies-vienna/&lt;/a&gt; for more information.&lt;/p&gt;
&lt;/div&gt;
&lt;h2&gt;Hadley Wickham&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-02-25-meetup-announcement/logo2.png"&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-02-25-meetup-announcement/Hadley-wickham2016-02-04.jpg"&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-02-25-meetup-announcement/packages.png"&gt;
&lt;p&gt;Hadley is chief scientist at R-Studio and well known for his &lt;a href="https://github.com/hadley"&gt;contributions&lt;/a&gt; to the R community to make the life of data scientists easier. His &lt;em&gt;tidyverse&lt;/em&gt; consists of popular packages like &lt;a href="https://ggplot2.tidyverse.org"&gt;ggplot2&lt;/a&gt;, &lt;a href="https://dplyr.tidyverse.org"&gt;dplyr&lt;/a&gt;, &lt;a href="https://readr.tidyverse.org"&gt;readr&lt;/a&gt; and the software development package &lt;a href="https://CRAN.R-project.org/package=devtools"&gt;devtools&lt;/a&gt;. He is the author of numerous (online) books like &lt;a href="https://r4ds.had.co.nz/"&gt;R for Data Science&lt;/a&gt; and &lt;a href="https://adv-r.hadley.nz/"&gt;Advanced R&lt;/a&gt;. Last but not least, he won the John Chambers award in 2006 for his former versions of the &lt;a href="http://had.co.nz/reshape/"&gt;reshape&lt;/a&gt; and &lt;a href="https://cran.r-project.org/src/contrib/Archive/ggplot/"&gt;ggplot&lt;/a&gt; package&lt;a href="#fn1" class="footnote-ref" id="fnref1"&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;div id="questions-for-the-qa-session" class="section level3"&gt;
&lt;h3&gt;Questions for the Q&amp;amp;A Session&lt;/h3&gt;
&lt;p&gt;If you want to ask Laura or Hadley any questions during the Q&amp;amp;A session or if you have some other question regarding the Meetup please e-mail your topic to &lt;a href="mailto:viennar@quantargo.com" class="email"&gt;viennar@quantargo.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;See you at the Meetup and happy R-coding!&lt;/p&gt;
&lt;/div&gt;</description></item><item><title>Why Management Loves Overfitting</title><link>https://www.quantargo.com/blog/post/2019-01-23-why-management-loves-overfitting</link><pubDate>Wed, 23 Jan 2019 00:00:00 +0000</pubDate><guid>https://www.quantargo.com/blog/post/2019-01-23-why-management-loves-overfitting</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Why Management Loves Overfitting&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-01-23-why-management-loves-overfitting/overfitting-classification.png"&gt;
&lt;p&gt;The role of a data scientist involves building and fine-tuning of models and improve processes and products in various business areas. Typical use cases involve marketing campaigns, customer churn prediction or fraud detection. Trained models should not only work on (seen) training data but also on new (unseen) real-world data. However, this requirement is typically not obvious to most decision makers involved, who tend to favour overfitted models and delude themselves with fabulous numbers and promises. The problems always arise straight after implementation when the results do not follow suit. It is thus the task of every responsible data scientist to manage expectations right and employ industries best practices as covered in our course on &lt;a href="https://www.quantargo.com/courses/course-r-machine-learning"&gt;Machine Learning with R&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To see the problem of overfitting in action let’s look at a simple relationship in the famous &lt;code&gt;mtcars&lt;/code&gt; dataset between the weight (wt) of a car in tonnes and its range per gallon (mpg, miles-per-gallon). Obviously, the heavier the car the less miles per gallon it goes (or the higher its fuel consumption). We have modeled the relationship using the &lt;code&gt;smooth.spline()&lt;/code&gt; function in R and used the smoothing parameter (&lt;code&gt;spar&lt;/code&gt;) as a parameter in the slider. We see that a &lt;code&gt;spar&lt;/code&gt; close to zero seems to model the relationship quite well (smooth). By increasing the &lt;code&gt;spar&lt;/code&gt; of the model it begins to fit observations more closely, thus its variance is increased. However, once &lt;code&gt;spar&lt;/code&gt; gets closer to one the spline starts to loose its smooth shape and zig-zag—a sign of &lt;strong&gt;overfitting&lt;/strong&gt;.&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-01-23-why-management-loves-overfitting/2019-01-23-why-management-loves-overfitting_files/figure-html/unnamed-chunk-2-1.png"&gt;
&lt;p&gt;The same phenomenon can be shown in a classification example. We use the basic K-nearest neighbour model to differentiate 3 iris species among 50 flowers using the variables sepal length/width and petal length/width. The three classes can easily differentiated visually into three areas. By moving the number of neighbours closer to one we increase model variance and observe that decision boundaries get fragmented.&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-01-23-why-management-loves-overfitting/2019-01-23-why-management-loves-overfitting_files/figure-html/unnamed-chunk-3-1.png"&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-01-23-why-management-loves-overfitting/2019-01-23-why-management-loves-overfitting_files/figure-html/unnamed-chunk-4-1.png"&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2019-01-23-why-management-loves-overfitting/2019-01-23-why-management-loves-overfitting_files/figure-html/unnamed-chunk-5-1.png"&gt;
&lt;p&gt;Even if more observations can be correctly classified in-sample, or similarly, the regression error could be reduced, we should always keep in mind that model performance is only judged by out-of-sample data. Thus, decision makers should always be much more aware of how the model has been be selected instead of how good the reported performance is. To be on the safe side, and if enough data is available, we can always keep a final test set aside (not available to model developers) to evaluate actual performance - pretty much like a &lt;a href="https://www.kaggle.com/"&gt;Kaggle&lt;/a&gt; competition.&lt;/p&gt;
&lt;p&gt;So my final recommendations would be that:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Don’t fool yourselves and be honest with out-of-sample data/performance.&lt;/li&gt;
&lt;li&gt;Manage expectation of decision makers well - be realistic.&lt;/li&gt;
&lt;li&gt;If results look extremely good at your first try - they most probably are wrong.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Happy (Correct) Fitting!&lt;/p&gt;</description></item><item><title>Let's play together: Collaborative Data Science</title><link>https://www.quantargo.com/blog/post/2018-09-19-collaborative-data-science</link><pubDate>Wed, 19 Sep 2018 00:00:00 +0000</pubDate><guid>https://www.quantargo.com/blog/post/2018-09-19-collaborative-data-science</guid><category>R</category><category>Blog</category><description>&lt;h2&gt;Why is it so hard?&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2018-09-19-collaborative-data-science/wobble_graph_small.gif"&gt;
&lt;p&gt;From experience we’ve learned that most data science projects are not truly collaborative efforts but only driven by a few key players. Best (public) examples are most open source R and Python packages available on Github. However, collaboration of data science teams can be &lt;em&gt;the&lt;/em&gt; determining factor driving innovation in a sustainable way. We highlight some common problems in data science projects and give guidance how collaboration can be improved to facilitate a data-driven transformation in organisations.&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2018-09-19-collaborative-data-science/The%20Righteous%20Mind_small.jpg"&gt;
&lt;p&gt;Data Science is an interdisciplinary field and requires diverse skill sets to deliver data products. On top of software- and data engineering skills a solid statistical background is needed to reveal interesting patterns and build models. However, we often see a clash of cultures in engineering vs data science/modelling teams. While the former group typically cares more about code quality, testing, and deployment the latter is mostly focused on methodology- and data correctness. Also the development process is quite different: Agile/SCRUM vs. research/hypothesis driven.&lt;/p&gt;
&lt;p&gt;Last but not least we see strong opinions and conflicts in data science teams. Most of them are about tools (R vs. Python), methodology (statistical rigorous vs data mining/brute force) and project priorities. Data Science is a very new field and most of these questions depend on the specific problem and respective institutional/company background.&lt;/p&gt;
&lt;h2&gt;Why is it so important?&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2018-09-19-collaborative-data-science/wobble_graph_small.gif"&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2018-09-19-collaborative-data-science/2018-09-19-collaborative-data-science_files/figure-html/unnamed-chunk-1-1.png"&gt;
&lt;p&gt;Having a large and diverse group of people working in a relatively new and unstructured environment like Data Science projects can lead to great ideas and innovation - or to utter chaos. The border here is typically very thin and can be &lt;strong&gt;positively influenced&lt;/strong&gt; if you have&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Open team spirit and transparency generating new ideas.&lt;/li&gt;
&lt;li&gt;Teams working efficiently together on projects, reviewing each others ideas which are generated on a continuous basis - with room for failure.&lt;/li&gt;
&lt;li&gt;A well-managed code base which is , maintained and reviewed leading to increased re-usability and positive network effects.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Ingredients leading to &lt;strong&gt;adverse effects&lt;/strong&gt; are just the opposite:&lt;/p&gt;
&lt;ol style="list-style-type: decimal"&gt;
&lt;li&gt;Team rivalries and politically motivated decision making - fear of failure.&lt;/li&gt;
&lt;li&gt;Teams not communicating with each other, working on redundant projects.&lt;/li&gt;
&lt;li&gt;No managed and reviewed code base consisting of a handful of undocumented scripts/notebooks which leads to no re-usability.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In general the question remains what kind of environment can be created - either from the technical or human resources side - to improve long-lasting positive network effects, or in particular:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;How can code be managed&lt;/strong&gt; to have positive network effects?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How can teams&lt;/strong&gt; efficiently communicate and &lt;strong&gt;collaborate&lt;/strong&gt; together?&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Case study: The CRAN package repository&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2018-09-19-collaborative-data-science/wobble_graph_small.gif"&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2018-09-19-collaborative-data-science/2018-09-19-collaborative-data-science_files/figure-html/unnamed-chunk-2-1.png"&gt;
&lt;p&gt;To see the biggest (public) statistical code base in action let’s take a look at the &lt;a href="https://cran.r-project.org"&gt;CRAN&lt;/a&gt; package repository which has experienced an astonishing growth over the last decade. It hosts well over 10,000 R packages written by authors all over the world. A large part of its success is driven by the simple yet powerful package &lt;a href="https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Package-structure"&gt;structure&lt;/a&gt; inspired by the &lt;a href="https://www.debian.org"&gt;Debian&lt;/a&gt; Linux package system. Each package is checked for errors by CRAN repository maintainers using &lt;code&gt;R CMD check --as-cran &amp;lt;packagename&amp;gt;&lt;/code&gt; and released for all major platforms: Windows, Mac OS and Linux. Even compiled (C++) code within R packages is checked through Address Sanitizers (ASAN) and Undefined Behavior Sanitizers (UBSAN), see also &lt;a href="https://cran.r-project.org/web/checks/check_issue_kinds.html"&gt;CRAN Package Check Issue Kinds&lt;/a&gt;. These and many more procedures lead to a code base which is easier to re-use and maintain, see also &lt;a href="https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Checking-packages"&gt;Writing R Extensions&lt;/a&gt; and Hadley’s more verbose &lt;a href="http://r-pkgs.had.co.nz/check.html"&gt;description&lt;/a&gt; of the &lt;code&gt;R CMD check&lt;/code&gt; workflow.&lt;/p&gt;
&lt;p&gt;The implemented function &lt;code&gt;tools:::CRAN_package_db()&lt;/code&gt; has been used to extract all relevant package metadata.&lt;/p&gt;
&lt;h2&gt;CRAN Package Network&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2018-09-19-collaborative-data-science/wobble_graph_small.gif"&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2018-09-19-collaborative-data-science/2018-09-19-collaborative-data-science-plot1.png"&gt;
&lt;pre&gt;`summarise()` ungrouping output (override with `.groups` argument)
`summarise()` ungrouping output (override with `.groups` argument)&lt;/pre&gt;
&lt;p&gt;R packages can also depend on other packages as defined in the package &lt;a href="https://cran.r-project.org/doc/manuals/r-release/R-exts.html#The-DESCRIPTION-file"&gt;DESCRIPTION&lt;/a&gt; file through &lt;code&gt;Imports&lt;/code&gt; or &lt;code&gt;Depends&lt;/code&gt;. This makes proper check procedures and interfaces between packages even more important since an error in one dependency can affect a large number of packages. The picture above shows the dependency graph of the most downloaded R packages on CRAN.&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2018-09-19-collaborative-data-science/2018-09-19-collaborative-data-science_files/figure-html/unnamed-chunk-4-1.png"&gt;
&lt;p&gt;Interestingly, we observe that the vast majority of packages is developed by only one single author. The reasons for this could be manifold including the R package structure itself to be most suitable for single writers, scientific studies conducted by only few scientists and general social behavior of R-programmers.&lt;/p&gt;
&lt;p&gt;However, the lack of communication between package authors and a clear overview of which packages actually exist can lead to &lt;strong&gt;redundant developments&lt;/strong&gt;. While a wider variety of packages for different model implementations can be helpful it does not make much sense for infrastructure packages. A good example is the package universe dealing with &lt;strong&gt;Excel&lt;/strong&gt; files including &lt;a href="https://CRAN.R-project.org/package=xlsx"&gt;xlsx&lt;/a&gt;, &lt;a href="https://CRAN.R-project.org/package=XLConnect"&gt;XLConnect&lt;/a&gt;, &lt;a href="https://CRAN.R-project.org/package=gdata"&gt;gdata&lt;/a&gt;, &lt;a href="https://CRAN.R-project.org/package=openxlsx"&gt;openxlsx&lt;/a&gt; and &lt;a href="https://CRAN.R-project.org/package=readxl"&gt;readxl&lt;/a&gt;. The graph below shows how these packages create different clusters of reverse dependencies. Some of them even wrap functionalities of different Excel packages and act as connectors/wrappers like &lt;a href="https://CRAN.R-project.org/package=DataLoader"&gt;DataLoader&lt;/a&gt; or &lt;a href="https://CRAN.R-project.org/package=ImportExport"&gt;ImportExport&lt;/a&gt;.&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2018-09-19-collaborative-data-science/wobble_graph_small.gif"&gt;
&lt;h2&gt;Psychological Barriers&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2018-09-19-collaborative-data-science/wobble_graph_small.gif"&gt;
&lt;p&gt;Last but not least there exist very basic barriers for authors not collaborating with each other: &lt;strong&gt;EGO&lt;/strong&gt;. Not satisfied with most existing packages&lt;a href="#fn1" class="footnote-ref" id="fnref1"&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; dealing with &lt;a href="https://en.wikipedia.org/wiki/Hierarchical_Data_Format"&gt;HDF5&lt;/a&gt;&lt;a href="#fn2" class="footnote-ref" id="fnref2"&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; files to store high-frequency tick-data in a high-performance, language independent format I decided to create a new one: &lt;strong&gt;h5&lt;/strong&gt; (deprecated but still on &lt;a href="https://github.com/mannau/h5"&gt;Github&lt;/a&gt; and &lt;a href="https://CRAN.R-project.org/package=h5"&gt;CRAN&lt;/a&gt;).&lt;/p&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2018-09-19-collaborative-data-science/data_exchange3.png"&gt;
&lt;p&gt;After having spent quite some time developing a &lt;strong&gt;h5&lt;/strong&gt; which was presented at &lt;a href="http://www.rinfinance.com/RinFinance2016/agenda"&gt;R/Finance 2016&lt;/a&gt; I received an E-mail from Holger which stated that he also developed a package to tackle the problem:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;On June 21, 2016 Holger wrote:&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;… my name is Holger Hoefling, I have developed a new version of a wrapper library for hdf5 (R6 Classes, almost all function calls wrapped, full support for all datatypes including tables etc) …&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Having overcome my own EGO barrier (which was quite hard) and after inspecting &lt;em&gt;his&lt;/em&gt; package we agreed to work together on &lt;em&gt;one&lt;/em&gt; HDF5 package and &lt;strong&gt;merge codebases&lt;/strong&gt; (which sounds easier than it was) to&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Maintain high-level interface and test cases from &lt;strong&gt;h5&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Get low-level HDF5 support within R&lt;/li&gt;
&lt;/ul&gt;

&lt;img src="https://www.quantargo.com/assets/blog/2018-09-19-collaborative-data-science/merge-git.png"&gt;
&lt;h2&gt;The Joys Collaboration&lt;/h2&gt;
&lt;img src="https://www.quantargo.com/assets/blog/2018-09-19-collaborative-data-science/wobble_graph_small.gif"&gt;
&lt;p&gt;The joys of collaboration (after overcoming psychological barriers) are great and typically lead to longer-term projects, regular code-reviews and in my case a merged package which is of higher quality than each of the previous ones.&lt;/p&gt;
&lt;p&gt;My recommendations are thus as follows:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Q: How can code be managed to have positive network effects? y&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Put it into re-usable package.&lt;/li&gt;
&lt;li&gt;Continuous code-reviews and tests.&lt;/li&gt;
&lt;li&gt;Use a transparent code platform to inspect source (like &lt;a href="https://www.github.com"&gt;Github&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Q: How can teams efficiently communicate and collaborate together?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Have the right tools and mindset in place.&lt;/li&gt;
&lt;li&gt;Incentivize collaborative efforts.&lt;/li&gt;
&lt;li&gt;Accept unexpected hypotheses and failures&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open mindedness&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>