/  PySpark Basics – Quiz

This Quiz contains totally 25 Questions each carry 1 point for you.

1.What is PySpark?
A library for Python
A distributed processing system
A database system
A machine learning platform

Correct!

Wrong!

2.What does RDD stand for in PySpark?
Resilient Distributed Data
Random Distributed Data
Resilient Distributed Dataset
Random Distributed Dataset

Correct!

Wrong!

3.Which of the following is NOT a feature of RDDs?
Fault-tolerant
Immutable
Stored in a single machine
Parallel processing

Correct!

Wrong!

4.What does the 'resilient' in Resilient Distributed Dataset (RDD) mean?
Data is spread across multiple nodes
Data can be recovered in case of failure
Data can be processed in parallel
Data is mutable

Correct!

Wrong!

5.What is the primary function of RDD Transformations?
Perform computation on the data
Create a new RDD from the existing one
Both a and b
None of the above

Correct!

Wrong!

6.Which of the following is NOT an action in PySpark's RDD?
reduce()
collect()
map()
count()

Correct!

Wrong!

7.What does the action 'collect()' do in PySpark's RDD?
It collects the data from the RDD to the driver program
It distributes the data from the driver program to the RDD
It performs a map operation on the RDD
None of the above

Correct!

Wrong!

8.What does persistence (or caching) of RDDs mean in PySpark?
Storing RDDs permanently
Storing RDDs in memory for faster access in repeated usage
Distributing RDDs across nodes
All of the above

Correct!

Wrong!

9.Which of the following functions can be used to create an RDD from a text file in PySpark?
sc.textFile()
sc.parallelize()
sc.map()
sc.reduce()

Correct!

Wrong!

10.Which of the following is a key characteristic of Key-Value RDDs?
They can be used to perform distributed computation.
They can be used to perform operations that are based on a key.
They are used for storing large amounts of data.
They are used for data sharing across different applications.

Correct!

Wrong!

11.Which of the following is not a component of PySpark architecture?
Driver Program
Cluster Manager
Executors
Indexer

Correct!

Wrong!

12.What is the role of the Driver Program in PySpark?
Distributing the data
Storing the data
Running the main() function and creating RDDs
Performing computations on the data

Correct!

Wrong!

13.What is the function of Executors in PySpark?
Distributing and scheduling tasks
Running computations and storing the data
Managing the cluster resources
Coordinating with the Driver Program

Correct!

Wrong!

14.What is the role of the Cluster Manager in PySpark?
Running the computations on the data
Creating RDDs
Managing and allocating resources
Storing the data

Correct!

Wrong!

15.Which of the following RDD transformations returns a new RDD by applying a function to each element of the RDD?
map()
filter()
reduce()
collect()

Correct!

Wrong!

16.What does the RDD transformation 'filter()' do?
It applies a function to each element of the RDD
It returns a new RDD containing only the elements that satisfy a predicate
It reduces the data in the RDD to a single value
It collects the data from the RDD to the driver program

Correct!

Wrong!

17.What does the RDD action 'reduce()' do?
It applies a function to each element of the RDD
It returns a new RDD containing only the elements that satisfy a predicate
It reduces the data in the RDD to a single value
It collects the data from the RDD to the driver program

Correct!

Wrong!

18.What is the result of the 'count()' action in PySpark's RDD?
It returns the total number of elements in the RDD
It returns the unique elements in the RDD
It returns the first element of the RDD
It returns the last element of the RDD

Correct!

Wrong!

19.Which of the following methods is used to persist an RDD in memory in PySpark?
cache()
persist()
save()
Both a and b

Correct!

Wrong!

20.Which of the following operations can be performed on Key-Value RDDs?
reduceByKey()
groupByKey()
sortByKey()
All of the above

Correct!

Wrong!

21.What is the output of the 'first()' action in PySpark's RDD?
It returns the total number of elements in the RDD
It returns the unique elements in the RDD
It returns the first element of the RDD
It returns the last element of the RDD

Correct!

Wrong!

22.What is the function of the 'flatMap()' transformation in PySpark's RDD?
It applies a function to each element of the RDD and flattens the result
It applies a function to each element of the RDD without changing the structure
It filters the elements of the RDD based on a predicate
It aggregates the elements of the RDD

Correct!

Wrong!

23.What does the 'take(n)' action do in PySpark's RDD?
It returns the first 'n' elements of the RDD
It returns the last 'n' elements of the RDD
It takes 'n' random elements from the RDD
None of the above

Correct!

Wrong!

24.What is a Pair RDD?
An RDD that contains key-value pairs
An RDD that contains only values
An RDD that contains only keys
An RDD that contains duplicate elements

Correct!

Wrong!

25.In the context of PySpark, what is a 'Partition'?
A method to store data in memory
A segment of a large distributed data set (RDD)
A type of transformation operation
A type of action operation

Correct!

Wrong!

Share the quiz to show your results !

Subscribe to see your results

Ignore & go to results

PySpark Basics – Quiz

You got %%score%% of %%total%% right

%%description%%

%%description%%

Loading...