Databricks Certified Machine Learning Professional Exam Questions
Are you ready to prepare for the Databricks Certified Machine Learning Professional Exam? PassQuestion is here to provide you with the most up-to-date and comprehensive Databricks Certified Machine Learning Professional Exam Questions that are designed to cover all the key topics and concepts that you need to master to obtain your certification with ease. With our valuable Databricks Certified Machine Learning Professional Exam Questions, you can confidently prepare yourself and increase your chances of achieving success in the Databricks Certified Machine Learning Professional Exam.
Databricks Certified Machine Learning Professional
The Databricks Certified Machine Learning Professional certification exam assesses an individual’s ability to use Databricks Machine Learning and its capabilities to perform advanced machine learning in production tasks. This includes the ability to track, version, and manage machine learning experiments and manage the machine learning model lifecycle. In addition, the certification exam assesses the ability to implement strategies for deploying machine learning models. Finally, test-takers will also be assessed on their ability to build monitoring solutions to detect data drift. Individuals who pass this certification exam can be expected to perform advanced machine learning engineering tasks using Databricks Machine Learning.
Exam Information
Type: Proctored certification
Number of items: 60 multiple-choice questions
Time limit: 120 minutes
Registration fee: $200
Languages: English
Delivery method: Online proctored
Prerequisites: None, but related training highly recommended
Recommended experience: 1+ years of hands-on experience performing the machine learning tasks outlined in the exam guide
Validity period: 2 years
Recertification: Recertification is required to maintain your certification status. Databricks Certifications are valid for two years from issue date.
Exam Sections and Objectives
Section 1: Experimentation – 30%
Data Management
● Read and write a Delta table
● View Delta table history and load a previous version of a Delta table
● Create, overwrite, merge, and read Feature Store tables in machine learning workflows
Experiment Tracking
● Manually log parameters, models, and evaluation metrics using MLflow
● Programmatically access and use data, metadata, and models from MLflow experiments
Advanced Experiment Tracking
● Perform MLflow experiment tracking workflows using model signatures and input examples
● Identify the requirements for tracking nested runs
● Describe the process of enabling autologging, including with the use of Hyperopt
● Log and view artifacts like SHAP plots, custom visualizations, feature data, images, and metadata
Section 2: Model Lifecycle Management – 30%
Preprocessing Logic
● Describe an MLflow flavor and the benefits of using MLflow flavors
● Describe the advantages of using the pyfunc MLflow flavor
● Describe the process and benefits of including preprocessing logic and context in custom model classes and objects
Model Management
● Describe the basic purpose and user interactions with Model Registry
● Programmatically register a new model or new model version.
● Add metadata to a registered model and a registered model version
● Identify, compare, and contrast the available model stages
● Transition, archive, and delete model versions
Model Lifecycle Automation
● Identify the role of automated testing in ML CI/CD pipelines
● Describe how to automate the model lifecycle using Model Registry Webhooks and Databricks Jobs
● Identify advantages of using Job clusters over all-purpose clusters
● Describe how to create a Job that triggers when a model transitions between stages, given a scenario
● Describe how to connect a Webhook with a Job
● Identify which code block will trigger a shown webhook
● Identify a use case for HTTP webhooks and where the Webhook URL needs to come.
● Describe how to list all webhooks and how to delete a webhook
Section 3: Model Deployment – 25%
Batch
● Describe batch deployment as the appropriate use case for the vast majority of deployment use cases
● Identify how batch deployment computes predictions and saves them somewhere for later use
● Identify live serving benefits of querying precomputed batch predictions
● Identify less performant data storage as a solution for other use cases
● Load registered models with load_model
● Deploy a single-node model in parallel using spark_udf
● Identify z-ordering as a solution for reducing the amount of time to read predictions from a table
● Identify partitioning on a common column to speed up querying
● Describe the practical benefits of using the score_batch operation
Streaming
● Describe Structured Streaming as a common processing tool for ETL pipelines
● Identify structured streaming as a continuous inference solution on incoming data
● Describe why complex business logic must be handled in streaming deployments
● Identify that data can arrive out-of-order with structured streaming
● Identify continuous predictions in time-based prediction store as a scenario for streaming deployments
● Convert a batch deployment pipeline inference to a streaming deployment pipeline
● Convert a batch deployment pipeline writing to a streaming deployment pipeline
Real-time
● Describe the benefits of using real-time inference for a small number of records or when fast prediction computations are needed
● Identify JIT feature values as a need for real-time deployment
● Describe model serving deploys and endpoint for every stage
● Identify how model serving uses one all-purpose cluster for a model deployment
● Query a Model Serving enabled model in the Production stage and Staging stage
● Identify how cloud-provided RESTful services in containers is the best solution for production-grade real-time deployments
Section 4: Solution and Data Monitoring – 15%
Drift Types
● Compare and contrast label drift and feature drift
● Identify scenarios in which feature drift and/or label drift are likely to occur
● Describe concept drift and its impact on model efficacy
Drift Tests and Monitoring
● Describe summary statistic monitoring as a simple solution for numeric feature drift
● Describe mode, unique values, and missing values as simple solutions for categorical feature drift
● Describe tests as more robust monitoring solutions for numeric feature drift than simple summary statistics
● Describe tests as more robust monitoring solutions for categorical feature drift than simple summary statistics
● Compare and contrast Jenson-Shannon divergence and Kolmogorov-Smirnov tests for numerical drift detection
● Identify a scenario in which a chi-square test would be useful
Comprehensive Drift Solutions
● Describe a common workflow for measuring concept drift and feature drift
● Identify when retraining and deploying an updated model is a probable solution to drift
● Test whether the updated model performs better on the more recent data
View Online Databricks Certified Machine Learning Professional Free Questions
1. Which of the following Databricks-managed MLflow capabilities is a centralized model store?
A.Models
B.Model Registry
C.Model Serving
D.Feature Store
E.Experiments
Answer: C
2. A machine learning engineer wants to log and deploy a model as an MLflow pyfunc model. They have custom preprocessing that needs to be completed on feature variables prior to fitting the model or computing predictions using that model. They decide to wrap this preprocessing in a custom model class ModelWithPreprocess, where the preprocessing is performed when calling fit and when calling predict. They then log the fitted model of the ModelWithPreprocess class as a pyfunc model.
Which of the following is a benefit of this approach when loading the logged pyfunc model for downstream deployment?
A.The pvfunc model can be used to deploy models in a parallelizable fashion
B.The same preprocessing logic will automatically be applied when calling fit
C.The same preprocessing logic will automatically be applied when calling predict
D.This approach has no impact when loading the logged Pvfunc model for downstream deployment
E.There is no longer a need for pipeline-like machine learning objects
Answer: E
3. Which of the following MLflow Model Registry use cases requires the use of an HTTP Webhook?
A.Starting a testing job when a new model is registered
B.Updating data in a source table for a Databricks SQL dashboard when a model version transitions to the Production stage
C.Sending an email alert when an automated testing Job fails
D.None of these use cases require the use of an HTTP Webhook
E.Sending a message to a Slack channel when a model version transitions stages
Answer: B
4. Which of the following lists all of the model stages are available in the MLflow Model Registry?
A.Development. Staging. Production
B.None. Staging. Production
C.Staging. Production. Archived
D.None. Staging. Production. Archived
E.Development. Staging. Production. Archived
Answer: A
5. A machine learning engineer needs to deliver predictions of a machine learning model in real-time. However, the feature values needed for computing the predictions are available one week before the query time.
Which of the following is a benefit of using a batch serving deployment in this scenario rather than a real-time serving deployment where predictions are computed at query time?
A.Batch serving has built-in capabilities in Databricks Machine Learning
B.There is no advantage to using batch serving deployments over real-time serving deployments
C.Computing predictions in real-time provides more up-to-date results
D.Testing is not possible in real-time serving deployments
E.Querying stored predictions can be faster than computing predictions in real-time
Answer: A
6. Which of the following describes the purpose of the context parameter in the predict method of Python models for MLflow?
A.The context parameter allows the user to specify which version of the registered MLflow Model should be used based on the given application’s current scenario
B.The context parameter allows the user to document the performance of a model after it has been deployed
C.The context parameter allows the user to include relevant details of the business case to allow downstream users to understand the purpose of the model
D.The context parameter allows the user to provide the model with completely custom if-else logic for the given application’s current scenario
E.The context parameter allows the user to provide the model access to objects like preprocessing models or custom configuration files
Answer: A
7. A machine learning engineering team has written predictions computed in a batch job to a Delta table for querying. However, the team has noticed that the querying is running slowly. The team has already tuned the size of the data files. Upon investigating, the team has concluded that the rows meeting the query condition are sparsely located throughout each of the data files.
Based on the scenario, which of the following optimization techniques could speed up the query by colocating similar records while considering values in multiple columns?
A.Z-Ordering
B.Bin-packing
C.Write as a Parquet file
D.Data skipping
E.Tuning the file size
Answer: E