
AWS Certified Machine Learning Specialty
صادرة من AWS
تتُعد من الشهادات المتقدمة والمهمة للمحترفين الذين يرغبون في التخصص في مجال تعلم الآلة (Machine Learning) باستخدام خدمات Amazon Web Services (AWS). الشهادة تركز على مهارات متقدمة في تعلم الآلة، بما في ذلك تصميم وبناء وتطوير نماذج تعلم الآلة باستخدام خدمات AWS. هذا يشمل التعامل مع البيانات الضخمة، اختيار الخوارزميات المناسبة، وتقييم أداء النماذج.
Domain 1: Data Engineering Task Statement
1.1: Create data repositories for ML.
• Identify data sources (for example, content and location, primary sources such as user data).
• Determine storage mediums (for example, databases, Amazon S3, Amazon Elastic File System [Amazon EFS], Amazon Elastic Block Store [Amazon EBS]). Task Statement
1.2: Identify and implement a data ingestion solution.
• Identify data job styles and job types (for example, batch load, streaming).
• Orchestrate data ingestion pipelines (batch-based ML workloads and streaming-based ML workloads). o Amazon Kinesis o Amazon Data Firehose o Amazon EMR o AWS Glue o Amazon Managed Service for Apache Flink
• Schedule jobs. Task Statement 1.3: Identify and implement a data transformation solution.
• Transform data in transit (ETL, AWS Glue, Amazon EMR, AWS Batch).
• Handle ML-specific data by using MapReduce (for example, Apache Hadoop, Apache Spark, Apache Hive).
Domain 2: Exploratory Data Analysis Task Statement
2.1: Sanitize and prepare data for modeling.
• Identify and handle missing data, corrupt data, and stop words.
• Format, normalize, augment, and scale data.
• Determine whether there is sufficient labeled data. o Identify mitigation strategies. o Use data labelling tools (for example, Amazon Mechanical Turk). Task Statement
2.2: Perform feature engineering.
• Identify and extract features from datasets, including from data sources such as text, speech, images, and public datasets.
• Analyze and evaluate feature engineering concepts (for example, binning, tokenization, outliers, synthetic features, one-hot encoding, reducing dimensionality of data). Task Statement
2.3: Analyze and visualize data for ML.
• Create graphs (for example, scatter plots, time series, histograms, box plots).
• Interpret descriptive statistics (for example, correlation, summary statistics, p-value).
• Perform cluster analysis (for example, hierarchical, diagnosis, elbow plot, cluster size). Domain
3: Modeling Task Statement
3.1: Frame business problems as ML problems.
• Determine when to use and when not to use ML.
• Know the difference between supervised and unsupervised learning.
• Select from among classification, regression, forecasting, clustering, recommendation, and foundation models.
Task Statement 3.2: Select the appropriate model(s) for a given ML problem.
• XGBoost, logistic regression, k-means, linear regression, decision trees, random forests, RNN, CNN, ensemble, transfer learning, and large language models (LLMs)
• Express the intuition behind models.
Task Statement 3.3: Train ML models.
• Split data between training and validation (for example, cross validation).
• Understand optimization techniques for ML training (for example, gradient descent, loss functions, convergence).
• Choose appropriate compute resources (for example GPU or CPU, distributed or non-distributed). o Choose appropriate compute platforms (Spark or non-Spark).
• Update and retrain models. o Batch or real-time/online
Task Statement 3.4: Perform hyperparameter optimization.
• Perform regularization. o Dropout o L1/L2
• Perform cross-validation. • Initialize models.
• Understand neural network architecture (layers and nodes), learning rate, and activation functions.
• Understand tree-based models (number of trees, number of levels).
• Understand linear models (learning rate).
Task Statement 3.5: Evaluate ML models.
• Avoid overfitting or underfitting. o Detect and handle bias and variance.
• Evaluate metrics (for example, area under curve [AUC]-receiver operating characteristics [ROC], accuracy, precision, recall, Root Mean Square Error [RMSE], F1 score).
• Interpret confusion matrices.
• Perform offline and online model evaluation (A/B testing).
• Compare models by using metrics (for example, time to train a model, quality of model, engineering costs).
• Perform cross-validation.
Domain 4: Machine Learning Implementation and Operations Task Statement
4.1: Build ML solutions for performance, availability, scalability, resiliency, and fault tolerance.
• Log and monitor AWS environments. o AWS CloudTrail and Amazon CloudWatch o Build error monitoring solutions.
• Deploy to multiple AWS Regions and multiple Availability Zones. • Create AMIs and golden images.
• Create Docker containers. • Deploy Auto Scaling groups.
• Right size resources (for example, instances, Provisioned IOPS, volumes).
• Perform load balancing.
• Follow AWS best practices.
Task Statement 4.2: Recommend and implement the appropriate ML services and features for a given problem.
• ML on AWS (application services), for example: o Amazon Polly o Amazon Lex o Amazon Transcribe o Amazon Q
• Understand AWS service quotas.
• Determine when to build custom models and when to use Amazon SageMaker built-in algorithms.
• Understand AWS infrastructure (for example, instance types) and cost considerations. o Use Spot Instances to train deep learning models by using AWS Batch.