SnowPro Advanced: Data Scientist Exam Guide

INTRODUCING SNOWFLAKE TRAINING FUNDS ➤  Learn more HERE. To purchase, contact us.  COVID-19 UPDATES ➤   Instructor-led classes will be delivered via virtual classroom only.

SnowPro Advanced: Data Scientist Exam Guide

Registration Opens: October 22nd, 2021 




The SnowPro Advanced: Data Scientist Certification exam will test advanced knowledge and skills used to apply comprehensive data science principles, tools, and methodologies using Snowflake. 

This certification will test the ability to:

  • Outline data science concepts
  • Implement Snowflake data science best practices 
  • Prepare data and feature engineering in Snowflake
  • Train and use machine learning models
  • Use data visualization to present a business case (e.g., model explain-ability)
  • Implement model lifecycle management



2+ years of practical data science experience with Snowflake, in an enterprise environment. 


In addition, successful candidates may have:

  • A statistical, mathematical, or science education (or equivalent work experience)

  • Background working with one or more of the following programming languages  (e.g., Python, R, SQL, PySpark, etc.) 

  • Experience modeling and using machine learning platforms (e.g., SageMaker, Azure Machine Learning, GCP AI platform, AutoML tools, etc.)

  • An understanding of various open source and commercial frameworks and libraries (e.g., scikit-learn, TensorFlow, etc. )

  • Experience preparing, cleaning, and transforming data sets from multiple sources

  • Experience creating features for machine learning training

  • Experience validating and interpreting models 

  • Experience putting a model into production and monitoring the model in production


Target Audience: 

  • Data Scientist, AI/ML Engineers, Quantitative Researchers



Number of Questions: 65

Unscored Content: Exams may include unscored items to gather statistical information for future use. These items are not identified on the form and do not impact your score, and additional time is factored into account for this content. 

Question Types: 

  • Multiple Select 
  • Multiple Choice

Time Limit: 115 minutes Languages: English 

Passing Score: 750 + on Scaled Scoring from 0 - 1000

Registration Fee: $375 USD

Delivery Options: 

  1. Online Proctoring - Webassesor 
  2. Onsite Kryterion Testing Centers

Prerequisites: SnowPro Core Certified

Click here for information on scheduling your exam.



This exam guide includes test domains, weightings, and objectives. It is not a comprehensive listing of all the content that will be presented on this examination. The table below lists the main content domains and their weighting ranges. 



Percentage of Exam Questions

1.0 Data Science Concepts

10% - 15%

2.0 Data Pipelining

15% - 20%

3.0 Data Preparation and Feature Engineering

25% - 30%

4.0 Model Development

25% - 30%

5.0 Model Deployment

15% - 20%


1.0 Domain: Data Science Concepts

1.1 Define machine learning concepts for data science workloads. 

  • Artificial intelligence

  • Machine Learning

    • Supervised learning

    • Unsupervised learning

    • Reinforcement learning

    • Deep learning

1.2 Outline machine learning problem types.

  • Supervised Learning

    • Structured Data

    • Unstructured Data

  • Unsupervised Learning

    • Clustering

1.3 Summarize the machine learning lifecycle.

  • Data Collection

  • Data Visualization and Exploration

  • Feature engineering 

  • Training models

  • Model deployment

  • Model monitoring and evaluation


1.4 Outline data governance for data science.

  • Dynamic data masking

  • Row level security

  • Role Based Access Control (RBAC)


1.5 Outline statistical concepts for data science.

  • Normal distribution

  • Central limit theorem

  • Z and T tests

  • Bootstrapping

  • Confidence intervals


1.6 Define model governance for data science.

  • Model versioning
  • Lineage
  • Model explain-ability


2.0 Domain: Data Pipelining 

2.1 Source and collect data into Snowflake from multiple sources.

  • Data loading
  • Snowpipe
  • External tables
  • Materialized views
  • Streams 
  • Tasks
  • Connecting ETL tools (Connectors)

2.2 Enrich data by consuming data sharing sources.

  • Snowflake Data Marketplace
  • Direct Sharing
  • Shared database considerations

2.3 Create a development environment (e.g., sandbox) and maintain the environment.

  •  Cloning
  •  Levels or hierarchy 
  • Automation to keep dataset updated
  • Time Travel

 2.4 Build a data science pipeline.        

  • Automation of data transformation
  • Streams and tasks
  • Functions
  • Stored procedures  
  • Connect Snowflake to machine learning platforms (e.g., connectors, ML partners, etc.)   


3.0 Data Preparation and Feature Engineering 

3.1   Prepare and clean the data for analysis in Snowflake. 

3.2 Perform feature engineering on Snowflake data.


  • Preprocessing 

  • Data transformations

  • Data Frames 

  • Snowpark 

  • Binarizing data

  • Binning continuous data into intervals

  • Label encoding

  • One hot encoding

  • Time Travel

3.3 Perform exploratory data analysis in Snowflake.


  • Snowsight and SQL 

    • Identify initial patterns

    • Connect external machine learning platforms and/or notebooks (e.g., Jupyter)

  • Use Snowflake native statistical functions to analyze and calculate descriptive data statistics.

  • Window Functions



  • TOPN

  • Approximation/High Performing functions 

  • Linear Regression 

    • Find the slope and intercept

    • Verify the dependencies on dependent and independent variables


4.0 Domain: Model Development

4.1  Connect data science tools directly to data in Snowflake.

  • Connectors

    • Python connector with panda support

    • Spark connector

    • R connector

  • Snowflake Best Practices

    • One platform, one copy of data, many workloads

    • Enrich datasets using the data marketplace

    • Stream and Tasks

    • External tables 

    • External functions to trigger training

    • Zero-copy cloning for training snapshots

    • Materialized views for training and prediction

    • Snowflake SQL for aggregation and sampling


4.2 Train a data science model. 

  • Hyperparameter tuning 

  • Optimization metric selection 

  • Partitioning

  • Down/Up-sampling


4.3  Validate a data science model.

  • ROC curve/confusion matrix

  • Regression problems

  • Residuals plot

  • Model metrics


4.4 Interpret a model. 

  • Feature impact

  • Partial dependence plots

  • Confidence intervals


5.0 Domain:  Model Deployment

5.1 Move a data science model into production. 

  • Deploy an external hosted model

  • Deploy a model in Snowflake


5.2 Score the effectiveness of a model and retrain if necessary.  

  • Metrics for model evaluation

  • External functions 

  • User defined functions (UDFs)

  • Storing predictions

  • Use Snowsight to do distribution comparison 

5.3 Outline model lifecycle and validation tools. 

  • Streams and Tasks

  • Metadata tagging

  • Partner model versioning

  • Automation of model retraining



We recommend individuals have at least 2+ years of hands-on Snowflake Practitioner experience in a Data Science role prior to attempting this exam. The exam will assess skills through scenario-based questions and real-world examples.  As preparation for this exam, we recommend a combination of hands-on experience, instructor-led training, and the utilization of self-study assets. 


Instructor-led Course recommended for this exam:

Free Self-Study recommended for this exam:


Ready to register? Click here for information on scheduling your exam.