R and Interview Questions

R Interview Questions

Posted in R, Interview Questions
R Interview Questions

Table of Contents

R programming language is used for statistical analysis, graphical representation, and reporting. R is freely available under the GNU General Public License (GNU GPL) GNU GPL allows end-user to use the software with the freedom to run, share and use the software.R programming language is used in the field of regression analysis, predictive modeling, probability estimation, data mining which further helps in data analysis.

R Interview Questions

Features of R programming language:

  1. R is used to handle a large amount of data and storage facilities.
  2. R is handy while dealing with statistical analysis and graphical representation of data.
  3. R provides a large number of operators to perform the calculation on arrays, lists, vector, and metrics.

Here is a list of R interview questions to help you prepare for that interview and get a job in the field of data science.

Question: What is the data structure in R, which helps in statistical analysis and graphical representation?

Answer: The following are the data structure in R, which is widely used:

  1. Array
  2. Matrix
  3. Vector
  4. Data Frame
  5. List
  6. Tables

Question: How to print something in R? <Practice of R basic syntax>

Answer: To write something, R uses print command.

>string_variable_name <- “R is an analytical language”
>print(string_variable_name)

Question: What is class() function R?

Answer: This function in R is a character vector giving the names of classes form which the object inherits.

Example:

>x <- 1:10
>class(x)
[1] “integer”

Question: What is a vector?

Answer: A vector is a sequence of data elements of the same primary type. Members in a vector are called components.

Example.

>vector_example <- c(2,3,4,5)
> print(vector_example)
[1] 2,3,4,5
>print (length(vector_example)
[1] 4

Question: How to perform arithmetic operations on Vectors? Show with some example

Answer: Many arithmetic operators are used in R. Remember, R uses the operators component by component. Let’s look at it with some standard operators.

>x <- c(1,2,3,4)
> y <- c(4,5,6,7)
> x+y
[1] 5 7 9 11
> x-y
[1] -3 -3 -3 -3
> z <- (4,4,4,4,4,4,4)
> x+z
[1] 5 6 7 8 5 6 7

When we have two vectors with unequal lengths, and we need to operate on both, then the shorter vector is used again and again to match the length of both the vectors.

Question: Define Index in Vector.

Answer: Index in the vector is used to give the element at that position of the vector. Few programming languages start the index with 0, and the other starts at 1. R counts the index from 1. There are many possibilities while putting an index number i.e.

1. Positive and in range index

> x<- (1,3,4,5)
> x[2]
[1] “3”

2. Out of range

>x <- (2,3,4,5)
> x[110]
[1] NA

3. Negative index- removes this element and replies back with all the left numbers.

> x <- (3,4,5,6,7)
> x[-3]
[1] “3” “4” “6” “7”

4. Range of values

>x <- (3,4,5,6,7,8)
> x[2:5]
[1] “4” “5” “6” “7”

5. Duplicate index

>x <- (3,4,5,6,7)
>s[c(2,1,2,3)]
[1] “4” “3” “4” “5”

6. Logical index

If we want to select a particular group of an index number, then you should use logical operators i.e., TRUE and FALSE

> x <- (2,3,4,5,6)
> s[c(TRUE, FALSE, FALSE, TRUE, TRUE)
[1] “2” “5” “6”

Question: What is a list?

Answer: A list, as the name suggests, are several vectors collected together. Suppose you have a numeric vector, a character vector, a Boolean vector, and some numbers. We want to combine it into one, which obviously won’t have the same data type. So we need to create a list.

> n = c(2,3,5)
> s = c (“a”, “b”, “c”, “d”, “e”)
> b= c(TRUE, FALSE, TRUE, FALSE, FALSE)
> x = list (n, s, b, 3)
> print(x)
[[1]]
[1] 2 3 5
[[2]]
[1] “a” “b” “c” “d” “e”
[[3]]
[1] TRUE FALSE TRUE FALSE FALSE
[[4]]
[1] 3

Question: What is Matrices?

Answer: A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the matrix function.

Example.

# Matrix creation
> M = matrix(c(1,2,3,4,5,6), nrow=2, ncol=3, by-row =TRUE)
print(M)
[1] [2] [3]
[1] 1 2 3
[1] 4 5 6
Where
nrow = number of rows in the matrix
ncol = number of columns in the matrix
byrow = TRUE/FALSE will get value first by row or column.

Question: What is an Array?

Answer: Array is a superset of Matrices. On the one hand, the matrices can be of 2 dimensions, but the array can be of any number of dimensions.

Example.

> a<- array(c(“car”, “bike”), dim (3,3,2))
> print (a)
, , 1
[,1] [,2] [,3]
[1,] “car” “bike” “car”
[2,] “bike” “car” “bike”
[3,] “car” “bike” “car”
, , 2
[,1] [,2] [,3]
[1,] “bike” “car” “bike”
[2,] “car” “bike” “car”
[3,] “bike” “car” “bike”
>my_array <- array(1:24, dim = c(3,4,2))
< my_array
, , 1
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
, , 2
[,1] [,2] [,3] [,4]
[1,] 13 16 19 22
[2,] 14 17 20 23
[3,] 15 18 21 24

Question: What is the factor?

Answer: Factors are the r-objects which are created using a vector. Factors in R are stored as a vector of integer values with a corresponding set of character values to use when the factor is displayed. The factor function is used to create a factor. The only required argument to factor is a vector value which will be returned as a vector of factor values. It stores the vector along with the distinct values of the elements in the vector labels.

Factors are created using the factor() function. The nlevels functions give the count of levels.

Example.

#First let’s create a vector
>vector_example <- c(‘a’, ‘b’, ‘c’, ‘a’, ‘a’)
#Now create a factor object
>factor_example <- factor(vector_example)
>print(factor_example)
[1] a b c a a
>print(nlevels(factor_example))
[1] 3

nlevels gives you the number of distinct values in the vector.

Question: What is the difference between Matrix and an array?

Answer: Matrix can have only two dimensions, whereas an array can have as many dimensions as you want. Matrix is defined with the help of data, number of rows, number of columns, and whether the elements are to be put in row-wise or column-wise.

In array, you need to give the dimension of the array. An array can be of any number of dimensions, and each dimension is a matrix. For example, a 3x3x2 array represents two matrices, each of dimension 3x3.

Question: What is a data frame?

Answer: The data frame is a list of vectors of equal length. It can consist of any vector with a particular type and can combine it into one. So, a data frame can have a vector of logical and another of numeric. The only condition is that all the vectors should have the same length.

Example.

#This is how the data frame is created
> student_profile <- data.frame(
Name <-c(“Ray”, “Green”, “Justin”)
Age <- c(22,23,24)
Class <- c(6,7,8)
)
print(stuent_profile)

The above code will create three columns with the columns name as name, age, and class.

Question: What is the difference between a matrix and a data frame?

Answer: A data frame can contain vectors with different inputs, and a matrix cannot. We can have a data frame of characters, integers, and even other data frames, but you can’t do that with a matrix since a matrix must be all the same type.

So, the data frame can have a different vector of character, numbers, and logic.

But, for the matrix, we need only one type of data type.

Question: How to read input from the user in R?

Answer:

Readinteger <- function()
{
n <- readline(prompt = “Enter an integer: “)
return(as.integer(n))
}
print(readinteger())

Readline lets the user enter one line string in R.

The prompt argument is printed on the screen of the user.

Question: Write a function to get the square of a number

Answer:

Square <- function(x) {
return(x^2)
}
print(Square(4))

Question: Write a countdown function in R?

Answer:

timer <- function(time)
{
print(time)
while(time!-0)
{
Sys.sleep(!)
Time <- time -1
print(time)
}
}
countdown(5)
[1] 5
[2] 4
[3] 3
[2] 2
[1] 1

Question: How to use mode function in R?

Answer: The mode is the value that has the highest number of occurrences in a set of data. Unlike the mean and median, the mode can have both numeric and character data.

R does not have a standard in-built function to calculate the mode. So we create a user function to calculate the mode of a data set in R. This function makes the vector as input and gives the mode value as output.

Example.

#Create the function
getmode <- function(v){
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v,uniqv)))]
}
#Create the vector with numbers.
v <- c(2,1,2,3,1,2,3,4,1,5,5,3,2,3)
#Calculate the mode using the user function
result <- getmode(v)
print(result)
[1] 2
#Create the vector with characters
charv <- c(“o”, “it”, “the”, “it”, “it”)
#Calculate the mode using the user function.
result <- getmode(charv)
print(result)
[1] “it”

Question: What does an enlist function does?

Answer: It converts a list to a vector

Question: What is the apply function in R?

Answer: apply(), and its family is one of the most used functions in R. We use apply when we want to apply a function to the rows or columns of the matrix.

Example:

M<- matrix(seq(1,16),4,4)
apply (M,1,min)
[1] 1 2 3 4

Question: What is the lapply() function in R?

Answer: lapply() function is used when we want to apply a function to each

Question: Differentiate between lapply and sapply

Answer: If the programmers want the output to be a data frame or a vector, then the sapply function is used, whereas if a programmer wants the output to be a list, then lapply is used.

Question: How to install a new package in R?

Answer: We need to know the name of the package

Syntax:

install.packages(“name_of_package”)

Question: What is the function of merge() function?

Answer: We can merge two data frames by using the merge function(). The data frames must have the same column names on which merging happens.

Example.

df1 <- data.frame(id <- c(1:6), name <-c(rep(“Amit”,3), rep(“Sumit”,3))
df2 < - data.frame(id <- c(7,8,9), name <- c(rep(“Nitin”,2),rep(“Paplu”,1))
*outer join
merge(x=df1, y= df2, by =”id”, all TRUE)

This all = TRUE will give you the outer join, so the new data set will have all the value from both the data frames merged on the id.

Question: What is data cleaning?

Answer: Data cleaning is a process in analytics that involves removing or amending data in a database that is incorrect, incomplete, improperly formatted, or duplicated.

Question: What is data reshaping?

Answer: Sometimes, we need data in a particular format. Initially, we import the data from a particular .csv file or txt file in a data frame. But, most of the time we also need a data set which is different from the initial data set plus we also need to add columns or position of the columns. So all this is data reshaping where you give the shape the initial data frame according to the need.

Question: Write a function to add two numbers in R

Answer:

add <- function(a,b)
{
c <- a+b
print(c)
}

Question: How can R be closed from the command line?

Answer: Use the function q()

Question: How to read the csv_input file in R?

Answer:

data <- read.csv(“csv_input.csv”)

Question: Explain the use of the scan function in R.

Answer: The scan() function is used to read various types of data or data objects, for example, data vectors. The command can be customized to read specific data. The command waits for input from the data and then return the value entered at the prompt.

Question: What are the different file formats using in the R programming language?

Answer:

  1. .RDA file format: These are the R objects that are used to attaching and loading files in R.
  2. .Rfiles: These are the files that are created inside the R editor by the dump function.
  3. .txt files: The .txt files are used to store datasets. R uses theread.table() and write.table() function.
  4. .csv files: The comma-separated values files are common data files.

Question: What is the function od summary() function?

Answer: summary() is an important command which helps us to get the statistical summary of the data. It contains all the statistical data like mean, median, min, max, 1st quartile, and 3rd quartile.

Question: How can you add datasets in R?

Answer: rbind() function can be used to add datasets in R language provided the columns in the datasets should be the same.

Question: What are the factor variables in the R language?

Answer: Factor variables are categorical variables that hold either string or numeric values. Factor variables are used in various types of graphics and particularly for statistical modeling, where the correct number of degrees of freedom is assigned to them.

Question: What is the use of the seq() function in R?

Answer: seq() function in R is used to provide the user with a sequence of numbers. If we need a sequence of numbers with a particular step i.e., 4,8,12,16, then we need to provide another attribute “by = ?” which will provide the steps.

Example.

> print(seq(5,11, by = 2))

[1] 5,7,9,11

Question: Define repeat loop

Answer: Repeat loop executes a sequence of statements multiple times. It doesn’t put the condition at the same place where we put the keyword repeat.

Example.

>name <-c(“Parry”,”John”)
>temp <-5
> repeat {
print(name)
temp <- temp +2
if(temp >11){
Break
}
}

This would return the name vector four times. First, it prints the name and increases the temperature to 7 and so on.

Question: How can one perform decision making in R?

Answer: Decision making in R is performed in the same way as in other languages. The three main decision-making statements contain:

  1. If statement
  2. If.else statement
  3. Switch statement

Question: There exist two vectors, a <- (3,4,5) and b <- (1,2), then what will be the output for c <- a * b?

Answer: c<- (3,8,5)

Question: What are the binary functions in R on which binary operators can be applied?

Answer: Scalars, Matrices, and Vectors

Question: What are the main characteristics of a data frame?

Answer: The following are the main characteristics:

  1. Row names should be unique.
  2. The column name should be non-empty
  3. There are only three types supported by the data stored in the data frame i.e., number, factor, and character.
  4. Each column should have the same number of data items. This is one of the main rules of data frames.

Question: Explain the use of string function in R

Answer: The str() function in R is used to get the structure of a data frame along with the first few observations. Suppose a data frame has four variables, each with three values. Then the output of this function will be like:

‘data.frame’: 3 obs. And 4 variable
$name: chr “Nitin” “Kamal” “Xtramous”
$age : int 16 18 20
$class: int 6 8 10

Question: What is the difference between seq(4) and seq_along(4)

Answer: seq(4) produces a vector from 1 to 4 (c(1,2,3,4)), whereas seq_along(4) produces a vector of length(4), or 1(c(1)).

Question: How to read a .csv file in R?

Answer: read.csv() function is used to read a CSV (Comma Separated Values) from the current working directory.

Example.

data_store<- read.csv(“abc.csv”)
print(data_store)

Question: Get all the data of the person having a maximum salary.

Answer:

max_salary_person <- subset(data,
salary == max(salary))
print(max_salary_person)

Question: How to get outer join, left join, right join, inner join, and cross join?

Answer:

outer join - merge (x= df1, y=df2, by= “id”, all= TRUE)
left join - merge (x= df1, y= df2, by = “id”, all.x = TRUE)
right join - merge (x= df1, y= df2, by = “id”, all.y = TRUE)
inner join - merge (x= df1, y= df2, by = “id”)
cross join - merge (x= df1, y= df2, by = NULL)

Question: What do you mean by casting? What is the use of cast() function?

Answer: It is used to get aggregate after melt(). So, now we have data arranged in some order, if we want to aggregate the columns with similar company_name and age, then we should use the cast() function.

Example.

Casted_data_set <- cast(new_data_set, company_name+age ~ variable, sum)

The function gives the aggregate salary and number of children with the same company and age.

Question: What is the use of sample function in R programming?

Answer: Sample() function can be used to select a random sample of size ‘n’ from a huge dataset.

Question: What is the use of subset function in R programming?

Answer: Subset() function is used to select variables and observations from a given dataset.

Question: What is the function of the rnorm() function? Explain with syntax.

Answer: rnorm function generates “n” normal random numbers based on the mean and standard deviation arguments passed to the function.

Syntax:

rnorm(n, mean = , sd= )

Question: How to make a scatterplot in R?

Answer: Scatterplot is a graph which shows many points plotted in the Cartesian plane. Each point holds two values that are present on the x and y-axis. The simple scatterplot is plotted using plot() function.

The syntax for scatterplot is:

plot(x,y,main,xlab,ylab,xlim,ylim,axes)

Where

x is the data set whose values are the horizontal coordinates

y is the data set whose values are the vertical coordinates

main is the tile in the graph

xlab and ylab is the label in the horizontal and vertical axis

xlim and ylim are the limits of values of x and y used in the plotting

axes indicate whether both axes should be there on the plot.

plot(x = input$wt,y = input$mpg,
xlab = “Weight”,
ylab = “Mileage”,
xlim = c(2.5,5)
ylim = c(15,30)
main = “Weight vs Mileage”
)

Question: What is the sink function in R?

Answer: The sink() function defines the direction of the output.

#direct output to a file
sink(“myfile”, append = FALSE, split = FALSE)
#return output to the terminal sink()

The append option controls whether output overwrites or adds to a file. The split option determines if the output is also sent to the screen as the output file.

Summary

We have provided you with the popular R interview questions for your preparations for the data science interview. We also recommend that you practice coding before you appear for an interview, and having a dummy project that you have worked on is always a plus. Do you have any other questions that you have come across in your interview? Or any other tips that you would like to share with the R community?

Comment below!

People are also reading:

Simran Kaur Arora

Simran Kaur Arora

Simran, born in Delhi, did her schooling and graduation from India in Computer Science. Curious and passionate about technology urged her to study for an MS in the same from the renowned Silicon Valley, California, USA. Graduated in 2017, she flew back to India and now works for hackr.io as a freelance technical writer. View all posts by the Author

Leave a comment

Your email will not be published
Cancel