Practice DP-203 Exam Online

Question 1

You are designing a financial transactions table in an Azure Synapse Analytics dedicated SQL pool. The table will have a clustered columnstore index and will include the following columns:

✑ TransactionType: 40 million rows per transaction type

✑ CustomerSegment: 4 million per customer segment

✑ TransactionMonth: 65 million rows per month

AccountType: 500 million per account type

You have the following query requirements:

✑ Analysts will most commonly analyze transactions for a given month.

✑ Transactions analysis will typically summarize transactions by transaction type, customer segment, and/or account type

You need to recommend a partition strategy for the table to minimize query times.

On which column should you recommend partitioning the table?

A : CustomerSegment

B : AccountType

C : TransactionType

D : TransactionMonth

Answer: D

Question 2

You are implementing a batch dataset in the Parquet format.

Data files will be produced be using Azure Data Factory and stored in Azure Data Lake Storage Gen2. The files will be consumed by an Azure Synapse Analytics serverless SQL pool.

You need to minimize storage costs for the solution.

What should you do?

A : Use Snappy compression for the files.

B : Use OPENROWSET to query the Parquet files.

C : Create an external table that contains a subset of columns from the Parquet files.

D : Store all data as string in the Parquet files.

Answer: C

Question 3

You have an Azure Databricks workspace named workspace1 in the Standard pricing tier.

You need to configure workspace1 to support autoscaling all-purpose clusters. The solution must meet the following requirements:

✑ Automatically scale down workers when the cluster is underutilized for three minutes.

✑ Minimize the time it takes to scale to the maximum number of workers.

✑ Minimize costs.

What should you do first?

A : Enable container services for workspace1.

B : Upgrade workspace1 to the Premium pricing tier.

C : Set Cluster Mode to High Concurrency.

D : Create a cluster policy in workspace1.

Answer: B

Question 4

You are planning a streaming data solution that will use Azure Databricks. The solution will stream sales transaction data from an online store. The solution has the following specifications:

The output data will contain items purchased, quantity, line total sales amount, and line total tax amount.

✑ Line total sales amount and line total tax amount will be aggregated in Databricks.

✑ Sales transactions will never be updated. Instead, new rows will be added to adjust a sale.

You need to recommend an output mode for the dataset that will be processed by using Structured Streaming. The solution must minimize duplicate data.

What should you recommend?

A : Update

B : Complete

C : Append

Answer: A

Question 5

You are designing an Azure Databricks interactive cluster. The cluster will be used infrequently and will be configured for auto-termination.

You need to ensure that the cluster configuration is retained indefinitely after the cluster is terminated. The solution must minimize costs.

What should you do?

A : Pin the cluster.

B : Create an Azure runbook that starts the cluster every 90 days.

C : Terminate the cluster manually when processing completes.

D : Clone the cluster after it is terminated.

Answer: A

Name:	Data Engineering on Microsoft Azure
Exam Code:	DP-203
Certification:	Azure Data Engineer Associate
Vendor:	Microsoft
Total Questions:	359
Last Updated:	Apr 22, 2024

Microsoft DP-203