The mid-level scikit-learn exam. Built by the people who maintain it.
The Professional Practitioner Certification is for working data scientists. Regularization, ensembles, feature engineering, nested cross-validation, and the judgement to pick a model and defend it to the business.
Seven competencies of a working mid-level data scientist.
The Professional certification is designed to ensure that our certified professionals possess both the conceptual understanding and the practical skills of a mid-level data scientist.
Advanced ML knowledge — Proficiency in a broad range of machine learning algorithms and the ability to select appropriate models for specific problems.
Programming expertise — Strong coding skills in Python, with experience in optimizing code for performance and scalability.
Data handling and engineering — Ability to handle large datasets, including data extraction, transformation, and loading processes.
Feature engineering — Experience in creating and selecting features to improve model performance.
Tuning and optimization — Proficiency in hyperparameter tuning, model selection, and ensemble methods to improve model performance.
Critical thinking — Approach complex problems systematically and evaluate multiple solutions, including diagnosing issues in a model pipeline.
Business expertise — How ML projects align with business goals and how to translate technical results into actionable business insights.
Five topics. The shape of the Professional exam.
A step beyond Associate. You need to recognize when a model is regularized correctly, when a CV strategy leaks, and how to communicate that to non-technical readers.
Machine learning concepts
The advanced mental model. Probabilistic outputs, regularization regimes, and what overfitting does to soft predictions.
- Supervised and unsupervised, regression, classification, clustering, dimensional reduction
- Model families: tree-based, linear, ensemble, neighbors
- Regularization: L1, L2, Elasticnet
- Hard and soft predictions: predict vs predict_proba
- Overfitting and underfitting, impact on soft predictions
Model building and evaluation
Pick the baseline, regularize the noise, ensemble when warranted, and choose the metric that fits the problem.
- Linear models as baselines
- Handling correlation with regularization and feature selection
- Bagging and boosting, the working ensemble methods
- Choosing metrics for outliers and imbalanced settings
Interpretation and communication
Read the plot, name the failure mode, explain it without using the word probability twice.
- Visualizing results with intermediate matplotlib and seaborn techniques
- Interpreting model outputs and performance metrics
- Communicating results to non-technical stakeholders
Data preprocessing
Heatmaps, PCA, polynomial features, label propagation. The shaping work that makes a real-world dataset trainable.
- Loading parquet datasets
- Heatmaps and PCA for first look
- Identifying strongly correlated features
- Missing values in the target via label propagation
- Feature engineering with PolynomialFeatures, SplineTransformer
- Combining features with FeatureUnion
Model selection and validation
Group structure, non i.i.d. data, nested CV, stable hyperparameters across folds.
- Cross-validation with group structure and non i.i.d. data
- Hyperparameter tuning: GridSearchCV, RandomSearchCV
- Stability of optimal hyperparameters via nested cross-validation
Three levels. You are on the second.
Three certifications, each matching a level and a typical data scientist career path.
Associate Practitioner
Junior data scientist. Fundamental ML, preprocessing, evaluation.
Professional
Mid-level. Regularization, ensembles, feature engineering, nested CV.
Expert
Senior practitioner. Production ML, scaling, governance.
Prepare with the Professional course on Skolar. Free to start.
The Professional track on Skolar matches this exam: regularization, ensembles, feature unions, and nested validation, with notebooks and practice questions written by the scikit-learn team.
Logistics, plain.
Everything you need to plan your sitting, in six lines.
Do I need Associate before Professional? No. Associate is recommended as a stepping stone but not required. If you have a year or two of working data science with scikit-learn, you can sit Professional directly.
Is there a hands-on component? Yes. The Professional exam adds one hands-on lab on top of the multiple-choice questions. You will write and tune a small pipeline against a held-out dataset, in a sandboxed scikit-learn environment.
What about retakes? One retake is included with your registration. After that, retakes are discounted. There is a 21-day cool-down between attempts so you can revisit weak topics on Skolar.
Is the credential verifiable? Yes. Every passing candidate gets a credential ID and a public verification page on probabl.ai. Recruiters can confirm validity without contacting you.
Does it expire? The Professional certification is valid for 3 years. Renew by passing the Expert exam, or by re-taking Professional at a discount.
Certify the work you already do, with scikit-learn.
120 minutes. $349 USD. Multiple-choice plus a hands-on lab, a credential issued by the maintainers themselves.