The mid-level scikit-learn exam. Built by the people who maintain it.

The Professional Practitioner Certification is for working data scientists. Regularization, ensembles, feature engineering, nested cross-validation, and the judgement to pick a model and defend it to the business.

Seven competencies of a working mid-level data scientist.

The Professional certification is designed to ensure that our certified professionals possess both the conceptual understanding and the practical skills of a mid-level data scientist.

  • Advanced ML knowledge — Proficiency in a broad range of machine learning algorithms and the ability to select appropriate models for specific problems.

  • Programming expertise — Strong coding skills in Python, with experience in optimizing code for performance and scalability.

  • Data handling and engineering — Ability to handle large datasets, including data extraction, transformation, and loading processes.

  • Feature engineering — Experience in creating and selecting features to improve model performance.

  • Tuning and optimization — Proficiency in hyperparameter tuning, model selection, and ensemble methods to improve model performance.

  • Critical thinking — Approach complex problems systematically and evaluate multiple solutions, including diagnosing issues in a model pipeline.

  • Business expertise — How ML projects align with business goals and how to translate technical results into actionable business insights.

Five topics. The shape of the Professional exam.

A step beyond Associate. You need to recognize when a model is regularized correctly, when a CV strategy leaks, and how to communicate that to non-technical readers.

Machine learning concepts

The advanced mental model. Probabilistic outputs, regularization regimes, and what overfitting does to soft predictions.

  • Supervised and unsupervised, regression, classification, clustering, dimensional reduction
  • Model families: tree-based, linear, ensemble, neighbors
  • Regularization: L1, L2, Elasticnet
  • Hard and soft predictions: predict vs predict_proba
  • Overfitting and underfitting, impact on soft predictions

Model building and evaluation

Pick the baseline, regularize the noise, ensemble when warranted, and choose the metric that fits the problem.

  • Linear models as baselines
  • Handling correlation with regularization and feature selection
  • Bagging and boosting, the working ensemble methods
  • Choosing metrics for outliers and imbalanced settings

Interpretation and communication

Read the plot, name the failure mode, explain it without using the word probability twice.

  • Visualizing results with intermediate matplotlib and seaborn techniques
  • Interpreting model outputs and performance metrics
  • Communicating results to non-technical stakeholders

Data preprocessing

Heatmaps, PCA, polynomial features, label propagation. The shaping work that makes a real-world dataset trainable.

  • Loading parquet datasets
  • Heatmaps and PCA for first look
  • Identifying strongly correlated features
  • Missing values in the target via label propagation
  • Feature engineering with PolynomialFeatures, SplineTransformer
  • Combining features with FeatureUnion

Model selection and validation

Group structure, non i.i.d. data, nested CV, stable hyperparameters across folds.

  • Cross-validation with group structure and non i.i.d. data
  • Hyperparameter tuning: GridSearchCV, RandomSearchCV
  • Stability of optimal hyperparameters via nested cross-validation

Three levels. You are on the second.

Three certifications, each matching a level and a typical data scientist career path.

Associate Practitioner

Junior data scientist. Fundamental ML, preprocessing, evaluation.

Professional

Mid-level. Regularization, ensembles, feature engineering, nested CV.

Expert

Senior practitioner. Production ML, scaling, governance.

Prepare with the Professional course on Skolar. Free to start.

The Professional track on Skolar matches this exam: regularization, ensembles, feature unions, and nested validation, with notebooks and practice questions written by the scikit-learn team.

Logistics, plain.

Everything you need to plan your sitting, in six lines.

Do I need Associate before Professional? No. Associate is recommended as a stepping stone but not required. If you have a year or two of working data science with scikit-learn, you can sit Professional directly.

Is there a hands-on component? Yes. The Professional exam adds one hands-on lab on top of the multiple-choice questions. You will write and tune a small pipeline against a held-out dataset, in a sandboxed scikit-learn environment.

What about retakes? One retake is included with your registration. After that, retakes are discounted. There is a 21-day cool-down between attempts so you can revisit weak topics on Skolar.

Is the credential verifiable? Yes. Every passing candidate gets a credential ID and a public verification page on probabl.ai. Recruiters can confirm validity without contacting you.

Does it expire? The Professional certification is valid for 3 years. Renew by passing the Expert exam, or by re-taking Professional at a discount.

Certify the work you already do, with scikit-learn.

120 minutes. $349 USD. Multiple-choice plus a hands-on lab, a credential issued by the maintainers themselves.