SnowPro Advanced: Data Scientist Exam Guide

News
INTRODUCING SNOWFLAKE TRAINING FUNDS ➤  Learn more HERE. To purchase, contact us.  COVID-19 UPDATES ➤   Instructor-led classes will be delivered via virtual classroom only.

SnowPro Advanced: Data Scientist Exam Guide

SNOWPRO ADVANCED: DATA SCIENTIST EXAM DETAILS 
Registration Opens: October 22nd, 2021 

 


 

 SNOWPRO ADVANCED: DATA SCIENTIST OVERVIEW 

The SnowPro Advanced: Data Scientist Certification exam will test advanced knowledge and skills used to apply comprehensive data science principles, tools, and methodologies using Snowflake. 

This certification will test the ability to:

  • Outline data science concepts
  • Implement Snowflake data science best practices 
  • Prepare data and feature engineering in Snowflake
  • Train and use machine learning models
  • Use data visualization to present a business case (e.g., model explain-ability)
  • Implement model lifecycle management

 

SNOWPRO™ ADVANCED: DATA SCIENCE CANDIDATE  

2+ years of practical data science experience with Snowflake, in an enterprise environment. 

 

In addition, successful candidates may have:

  • A statistical, mathematical, or science education (or equivalent work experience)

  • Background working with one or more of the following programming languages  (e.g., Python, R, SQL, PySpark, etc.) 

  • Experience modeling and using machine learning platforms (e.g., SageMaker, Azure Machine Learning, GCP AI platform, AutoML tools, etc.)

  • An understanding of various open source and commercial frameworks and libraries (e.g., scikit-learn, TensorFlow, etc. )

  • Experience preparing, cleaning, and transforming data sets from multiple sources

  • Experience creating features for machine learning training

  • Experience validating and interpreting models 

  • Experience putting a model into production and monitoring the model in production

 

Target Audience: 

  • Data Scientist, AI/ML Engineers, Quantitative Researchers

 


 

Number of Questions: 70

Question Types:

  • Multiple Select
  • Multiple Choice

Time Limite: 115 minutesLanguages: English 

BETA Registration Fee: $175 USD

Passing Score: Candidates will receive their score report after the Beta period ends

Delivery Options: 

 

  1. Online Proctoring - Webassesor 
  2. Onsite Kryterion Testing Centers
Prerequisites: SnowPro Core CertifiedClick here for information on scheduling your exam.  
   

 

 



EXAM OUTLINE

This exam guide includes test domains, weightings, and objectives. It is not a comprehensive listing of all the content that will be presented on this examination. The table below lists the main content domains and their weighting ranges. 

  

Domain

Percentage of Exam Questions

1.0 Data Science Concepts

10% - 15%

2.0 Data Pipelining

15% - 20%

3.0 Data Preparation and Feature Engineering

25% - 30%

4.0 Model Development

25% - 30%

5.0 Model Deployment

15% - 20%

 

EXAM TOPICS 
1.0 Domain: Data Science Concepts

1.1 Define machine learning concepts for data science workloads. 

  • Artificial intelligence

  • Machine Learning

    • Supervised learning

    • Unsupervised learning

    • Reinforcement learning

    • Deep learning

1.2 Outline machine learning problem types.

  • Supervised Learning

    • Structured Data

    • Unstructured Data

  • Unsupervised Learning

    • Clustering

1.3 Summarize the machine learning lifecycle.

  • Data Collection

  • Data Visualization and Exploration

  • Feature engineering 

  • Training models

  • Model deployment

  • Model monitoring and evaluation

 

1.4 Outline data governance for data science.

  • Dynamic data masking

  • Row level security

  • Role Based Access Control (RBAC)

 

1.5 Outline statistical concepts for data science.

  • Normal distribution

  • Central limit theorem

  • Z and T tests

  • Bootstrapping

  • Confidence intervals

 

1.6 Define model governance for data science.

  • Model versioning
  • Lineage
  • Model explain-ability

 

2.0 Domain: Data Pipelining 

2.1 Source and collect data into Snowflake from multiple sources.

  • Data loading
  • Snowpipe
  • External tables
  • Materialized views
  • Streams 
  • Tasks
  • Connecting ETL tools (Connectors)


2.2 Enrich data by consuming data sharing sources.

  • Snowflake Data Marketplace
  • Direct Sharing
  • Shared database considerations


2.3 Create a development environment (e.g., sandbox) and maintain the environment.

  •  Cloning
  •  Levels or hierarchy 
  • Automation to keep dataset updated
  • Time Travel

 2.4 Build a data science pipeline.        

  • Automation of data transformation
  • Streams and tasks
  • Functions
  • Stored procedures  
  • Connect Snowflake to machine learning platforms (e.g., connectors, ML partners, etc.)   

 

3.0 Data Movement 

3.1 Data Loading & Data Unloading

  • List best practices and the impact of different scenarios 

3.2 Continuous Data Loads Using Snowpipe

  • Outline how Snowpipe is different from Bulk Data loading
  • SQL syntax to create a pipe

3.3 Streams & Tasks

  • Working with Streams and Tasks
  • SQL Syntax to create and clone a Stream and Task

 

4.0 Domain: Model Development

4.1  Connect data science tools directly to data in Snowflake.

  • Connectors

    • Python connector with panda support

    • Spark connector

    • R connector

  • Snowflake Best Practices

    • One platform, one copy of data, many workloads

    • Enrich datasets using the data marketplace

    • Stream and Tasks

    • External tables 

    • External functions to trigger training

    • Zero-copy cloning for training snapshots

    • Materialized views for training and prediction

    • Snowflake SQL for aggregation and sampling

 

4.2 Train a data science model. 

  • Hyperparameter tuning 

  • Optimization metric selection 

  • Partitioning

  • Down/Up-sampling

 

4.3  Validate a data science model.

  • ROC curve/confusion matrix

  • Regression problems

  • Residuals plot

  • Model metrics

 

4.4 Interpret a model. 

  • Feature impact

  • Partial dependence plots

  • Confidence intervals

 

5.0 Domain:  Model Deployment

5.1 Move a data science model into production. 

  • Deploy an external hosted model

  • Deploy a model in Snowflake

 

5.2 Score the effectiveness of a model and retrain if necessary.  

  • Metrics for model evaluation

  • External functions 

  • User defined functions (UDFs)

  • Storing predictions

  • Use Snowsight to do distribution comparison 



5.3 Outline model lifecycle and validation tools. 

  • Streams and Tasks

  • Metadata tagging

  • Partner model versioning

  • Automation of model retraining

 


RECOMMENDED TRAINING

We recommend individuals have at least 2+ years of hands-on Snowflake Practitioner experience in a Data Science role prior to attempting this exam. The exam will assess skills through scenario-based questions and real-world examples.  As preparation for this exam, we recommend a combination of hands-on experience, instructor-led training, and the utilization of self-study assets. 

 

Instructor-led Course recommended for this exam:

Free Self-Study recommended for this exam:

 

Ready to register? Click here for information on scheduling your exam.