A data scientist is building a SparkML pipeline that includes StringIndexer, OneHotEncoder, VectorAssembler, and LogisticRegression. The pipeline must be reusable for both training and scoring. Which approach correctly constructs this pipeline?
This is a free preview. Create an account to access all 340 questions.