CIS4035-N Machine Learning Application and Report

TEESSIDE UNIVERSITY 
School of Computing, Engineering and Digital Technologies 
Module Title:  Machine Learning 
Module Code: CIS4035-N 
Assignment Title:  Machine Learning Application and Report 

Personal and Transferable 

  1. Select, apply and defend the selection and application of machine learning methodologies and experiments in academic reports.
  2. Demonstrate a systematic understanding of machine learning algorithms and their selection for solving a specific problem.

Research, Knowledge and Cognitive Skills

  1. Investigate state-of-the-art machine learning algorithms.
  2. Design appropriate representations of machine learning problems for input into machine learning packages and critically evaluate their effectiveness.
  3. Design and evaluate neural network configurations and learning mechanisms for sample problems.
  4. Analyse empirical results of the selected machine learning algorithms and justify the performance.

Professional Skills

  1. Autonomously implement and evaluate appropriate machine learning technique for particular learning tasks, taking into consideration professional, ethical and legal issues.

Task Description

Problems in machine learning vary from domain to another. In this coursework, you will select a dataset related to a real-world problem that best suits your area of interest. There are abundant of websites that provide publicly available datasets. A categorised list of datasets from GitHub can be found at https://github.com/caesar0301/awesome-public-datasets. The UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/index.php is another longstanding source of benchmark datasets for data mining and machine learning research. Kaggle https://www.kaggle.com/datasets has interesting real-world problems and datasets.

You can select a dataset from the above sources, or another one that is available online. The dataset should be publicly available. The chosen dataset should have a minimum of 1,000 instances (rows) and a minimum of 5 attributes (columns). You have to complete the following stages in this assignment:

  1. Define the problem for the selected data set and identify the machine learning algorithms that are applicable to this problem.
  1. Data exploration and preparation: The nature of the dataset may dictate some data exploration and preparation that can help inform the solutions. For example, higher dimensional datasets (those with too many attributes/columns) may require applying a data reduction method like Principal Component Analysis (PCA).
  1. Propose solutions: In this step, you will propose three machine learning algorithms that are applicable to the selected data set/problem.
  1. Design, implementation, modelling and evaluation: design, model and implement the proposed solutions and critically evaluate the solutions. Use appropriate visualisation for the results.
  1. Reflect on professional, ethical and legal issues in relation to the problem and the data set.

Element 1 Deliverable – Contribute 50% of the Module Mark

Element 1 will assess learning outcomes LO 2, 3, 4, 5 and 6

What to Hand In

  • Online - file in a pdf format via Blackboard that includes all source code and screenshots from your experiments appropriately labelled and commented

You need to demonstrate your code and results in the practical sessions in the last week (w/c 27th April 2020).

The code and experiments will be assessed on

  • Appropriateness of machine learning algorithm selected for the given task
  • Quality of software architecture and implementation
  • Quantitative performance of application

Element 2 – Contribute 50% of the Module Mark

Element 2 will assess learning outcomes LO 1, 2 and 7

What to Hand In

  • A case study report maximum of 2000 words that documents the process of the entire case study, including data set, problem, data preparation and exploration, selected algorithms, critical evaluation and justification of the algorithms and findings.
  • Online – file in a pdf format via Turnitin on Blackboard

The hand in is electronically via Blackboard, all deliverables shall be labelled with project name, your student name and university number. 

The report will be assessed on:

  • understanding of machine learning task
  • review of relevant literature
  • development methodology
  • justification of design decisions
  • consideration of professional, ethical and legal issues

The report could broadly include the following sections:

  • Abstract
  • Introduction (introduce the problem and its significance, write short literature review of related work)
  • Data exploration and features selection
  • Experiments
  • Results
  • Discussion, Conclusions and Future Work
  • References

These are generic section titles, which you may adapt appropriately to the application/problem that is investigated. You may include sections describing modifications of algorithms or developments that are novel and specific to your work. 

Marking Criteria

Grade

SOURCE CODE DOCUMENTATION AND DEMO

Excellent 70% and above

Clear evidence of running the experiments with code that is excellently organised and commented.

Machine learning algorithms selected are appropriate for the given task

Excellent quality of software architecture and implementation Excellent quantitative performance of application Deep understanding shown. 

Very Good

60% - 69%

Very good evidence of running the experiments with code that is well organised and commented.

Machine learning algorithms selected are appropriate for the given task

Very good quality of software architecture and implementation Very good quantitative performance of application Very good understanding.

Satisfactory

50% - 59%

Satisfactory evidence of running the experiments with code that is organised and commented.

Machine learning algorithms selected are appropriate for the given task

Satisfactory quality of software architecture and implementation Satisfactory quantitative performance of application Satisfactory understanding.

Fail Less than 50%

Little evidence of running the experiments with code that is not well organised and commented.

Machine learning algorithms selected are not appropriate for the given task

Poor quality of software architecture and implementation Poor quantitative performance of application Poor understanding.

NS

NON-

SUBMISSION

N/A

Grade

ACADEMIC QUALITY OF THE PAPER - 50%

Excellent 70% and above

Excellent technical quality (rigour of the experiments, data preparation, justification and correct application of the selected algorithms and suitability of the selection). 

Produced and demonstrated a comprehensive, high quality solution to the problem. Sufficient information for the reader is provided to reproduce the results.

Outstanding evidence of systematic review using multiple high quality academic sources. Logical, clear development of narrative. High quality references and citations. 

Outstanding evaluation and discussion of the significance of the results (Why the results are important? How does the paper advance the state of the art? How would the results be useful to other researchers or practioners? Is this a “real” problem or a small “toy” problem?)

 

Legal, social, ethical, security and professional issues fully considered. A paper, which could be, with minor modifications, suitable for a publication – or form the basis for a postgraduate project. There is some element of a novel approach to the problem or novel use of techniques.

Very Good

60% - 69%

Very good technical quality. 

Produced and demonstrated very good quality solution to the problem.

Sufficient information for the reader is provided to reproduce the results. Very good evidence of systematic review using multiple high quality academic sources. Logical, clear development of narrative. Appropriate references and citations. 

Very good evaluation and discussion of the significance of the results.

Legal, social, ethical, security and professional issues fully considered.

Satisfactory

50% - 59%

Satisfactory technical quality. 

Produced and demonstrated good quality solution to the problem. 

Good evidence of reviewing multiple academic sources. Some references and citations. 

Good evaluation and discussion of the significance of the results.

Legal, social, ethical, security and professional issues fully considered.

Fail

Below 50%

Not adequate technical quality. 

Produced and demonstrated a solution to the problem, which is flawed, despite some effort. 

Poor evidence of reviewing academic sources. 

Little evaluation and discussion of the results.

Little consideration of legal, social, ethical, security and professional issues.

Narrative difficult to follow. Poor quality of references and citations.

NS

NON-

SUBMISSION

N/A