**Solved by: AllAcademicHelp.com**

Previous answers to this question 27: 16 7 3 0 1

Assessment 4:Machine Learning Project

September 26th, 2018 by admin

Assessment InformationMachine LearningAssessment 4:Machine Learning ProjectThis document supplies detailed information on assessment tasks for this unit.Learning OutcomesThis assessment assesses the following Unit Learning Outcomes (ULO) and related Graduate Learning Outcomes (GLO):Unit Learning Outcome (ULO)Graduate Learning Outcome (GLO)ULO 2:Perform linear regression, classificationGLO 1:Discipline knowledge and capabilitiesusing logistic regression and linear Support VectorGLO 5:Problem solvingMachines.ULO 3: Perform non-linear classification usingGLO 1:Discipline knowledge and capabilitiesSupport Vector Machines with kernels, DecisionGLO 5:Problem solvingtrees and Random forests.ULO 4: Understand the concept of maximumGLO 1:Discipline knowledge and capabilitieslikelihood and Bayesian estimation.GLO 5:Problem solvingULO 5:Construct a multi-layer neural network usingGLO 1:Discipline knowledge and capabilitiesbackpropagation training algorithm.ULO 6:Perform model selection and computeGLO 2:Communicationrelevant evaluation measure for a given problem.PurposeThis assessment is an extensive machine learning project. Students will be given a specific data set for analysis and will be required to develop and compare various classification techniques. Each student must demonstrate skills acquired in data representation, classification and evaluation.Instructions•the datasetconsistsof training and testing data in “train” and “test” folders. Use training data:X_train.txt labels: y_train.txt and testing data: X_test.txt labels: y_test.txt. There are other files that also come with the dataset and may be useful in understanding the dataset better.•Please read the pdf file “dataset-paper.pdf” to answer Part 1.Task A:Understanding the dataAnswer the following questions briefly, after reading the paper•What is the objective of the data collection process?•What human activity types does this dataset have? How many subjects/people have performed these activities?•How many instances are available in the training and test sets? How many features are used to represent each instance? Summarize the type of features extracted in 2-3 sentences.•Describe briefly what machine learning model is used in this paper for activity recognition and how is it trained. How much is the maximum accuracy achieved?(3 Marks)Assessment InformationTask B:K-Nearest Neighbor ClassificationBuild a K-Nearest Neighbor classifier for this data.•Let K take values from 1 to 50. For choosing the best K, use 10-fold cross-validation. Choose the best value of K based on model F1-score.•Show a plot of cross-validation accuracy with respect to K.•Using the best K value, evaluate the model performance on the supplied test set. Report the confusion matrix, multi-class averaged F1-score and accuracy.(5 Marks)Task C:Multiclass Logistic Regression with Elastic Net¶Build an elastic-net regularized logistic regression classfier for this data.•Elastic-net regularizer takes in 2 parameters: alpha and l1-ratio. Use the following values for alpha:1e-4,3e-4,1e-3,3e-3, 1e-2,3e-2. Use the following values for l1-ratio: 0,0.15,0.5,0.7,1.•Choose the best values of alpha and l1-ratio using 10-fold cross-validation, based on model F1-score.•Draw a surface plot of F1-score with respect to alpha and l1-ratio values.•Use the best value of alpha and l1-ratio to re-train the model on the training set and use it to predict the labels of the test set. Report the confusion matrix, multi-class averaged F1-score and accuracy.(5 Marks)Task D:Support Vector Machine (RBF Kernel)Build a SVM (with RBF Kernel) classfier for this data.•SVM with RBF takes 2 parameters: gamma (length scale of the RBF kernel) and C (the costparameter). Use the following values for gamma: 1e-3, 1e-4. Use the following values for C: 1, 10, 100, 1000.•Choose the best values of gamma and C using 10-fold cross-validation, based on model F1-score.•Draw a surface plot of F1-score with respect to gamma and C.•Use the best value of gamma and C to re-train the model on the training set and use it to predict the labels of the test set. Report the confusion matrix, multi-class averaged F1-score and accuracy.(6 Marks)Task E:Random ForestBuild a Random forest classifier for this data.•Random forest uses two parameters: the tree-depth for each decision tree and the number of trees.Use the following values for the tree-depth: 300,500,600. Use the following values for the number oftrees: 200,500,700.•Choose the best values of tree-depth and number of trees using 10-fold cross-validation, based on model F1-score.•Draw a surface plot of F1-score with respect to tree-depth and number of trees.•Use the best value of tree-depth and number of trees to re-train the model on the training set and use it to predict the labels of the test set. Report the confusion matrix, multi-class averaged F1-score and accuracy.(6 Marks)Task F:DiscussionWrite a brief discussion about which classification method achieved the best performance. Your thoughts on the reason behind this. What method performed the worst? Could you do better or worse than the results in the dataset paper? Do you have any suggestions to further improve model performances?(5 Marks)Assessment InformationSubmission detailsUniversity has a strict standard on plagiarism as a part of Academic Integrity. To avoid any issues with plagiarism, students are strongly encouraged to run the similarity check with theTurnitinSpecial considerationYou may be eligible for special consideration if circumstances beyond your control prevent you from undertaking or completing an assessment task at the scheduled time.See the following link for advice on the application process:http://www.deakin.edu.au/students/studying/assessment-and-results/special-considerationReferencingYou must correctly use the Harvard method in this assessment.Machine LearningAssessment Task 4: Individual problem-solving rubricCriteriaExcellentGoodFairUnsatisfactoryCriteria 1:Successfully answered allSuccessfully answered atSuccessfully answeredFailed to answer anyUnderstand the data by reading the provided research article and answer fourfour questions.least 2 questions andonly one question.question satisfactorily.questions asked in the Part 1 of the assignment.satisfactorily tried others.3 Marks2 Marks1 Mark0 markCriteria 2:Successfully completed allSuccessfully completedSuccessfully completedFailed to complete anyBuild a K-Nearest Neighbor classifier for this data:four tasks.any two of the four tasksonly one of the four tasksgiven task.* Choose the best K value from given set of values and F1-score.and satisfactorily triedand satisfactorily tried* Show a plot of cross-validation accuracy with respect to K.one of the remainingone of the remaining* Using the best K value, evaluate the model performance using the suppliedtasks.tasks.test set.5 Marks3 Marks2 Marks0 Marks* Report the results as requested in the assignment.Criteria 3:Successfully completed allSuccessfully completedSuccessfully completedFailed to complete any* For L1 model, choose the best alpha value from the provide set of values.three tasks.any two of the threeany one of the threegiven task.* For L2 model, choose the best lambda value from the provided set of values.tasks.tasks.* Evaluate the prediction performance on test data, report results and discuss ifthere is any sign of underfitting or overfitting with appropriate reasoning.5 Marks3 Marks2 Marks0 MarksCriteria 4:Successfully completed allSuccessfully completedSuccessfully completedFailed to complete anyBuild a SVM (with RBF Kernel) classifier for this data.three tasksany two of the threeany one of the threegiven tasksSVM with RBF takes 2 parameters: gamma (length scale of the RBF kernel) andtasks.tasks.C (the cost parameter). Use the following values for gamma: 1e-3, 1e-4. Use thefollowing values for C: 1, 10, 100, 1000.Choose the best values of gamma and C using 10-fold cross-validation, basedon model F1-score.Draw a surface plot of F1-score with respect to gamma and C.Use the best value of gamma and C to re-train the model on the training setand use it to predict the labels of the test set. Report the confusion matrix,multi-class averaged F1-score and accuracy.6 Marks4 Marks3 Marks0 MarksMachine LearningAssessment Task 4: Individual problem-solving rubricCriteriaExcellentGoodFairUnsatisfactoryCriteria 5:Successfully completed allSuccessfully completedSuccessfully completedFailed to complete anyBuild a Random forest classfier for this data. (6 Marks)three tasks.any two of the threeany one of the threegiven task.Random forest uses two parameters: the tree-depth for each decision tree andtasks.tasks.the number of trees. Use the following values for the tree-depth: 300,500,600.Use the following values for the number of trees: 200,500,700.Choose the best values of tree-depth and number of trees using 10-fold cross-validation, based on model F1-score.Draw a surface plot of F1-score with respect to tree-depth and number of trees.Use the best value of tree-depth and number of trees to re-train the model onthe training set and use it to predict the labels of the test set. Report the6 Marks4 Marks3 Marks0 Marksconfusion matrix, multi-class averaged F1-score and accuracy.Criteria 6:Successfully completed allSuccessfully completedSuccessfully completedFailed to complete anyWrite a brief discussion about which classification method achieved the bestthree tasks.any two of the threeany one of the threegiven task.performance. Your thoughts on the reason behind this. What methodtasks.tasks.performed the worst? Could you do better or worse than the results in thedataset paper? Do you have any suggestions to further improve modelperformances?5 Marks3 Marks2 Marks0 Marks

Do you need any assistance with this question?Send us your paper details nowWe’ll find the best professional writer for you!

Share this:

Like this:

Like Loading…

**READY TO PLACE AN ORDER**

CLICK HERE TO ORDER 100% ORIGINAL PAPERS FROM AllAcademicHelp.com <<

## Leave a Reply