Questions


Related Questions:


Questions

Asked By :  Troy Davis
Answers1

031250 introduction to data analytic assignment 2  data

031250 introduction to data analytic
assignment 2 : Data exploration and preparation
KINME guide
As I have get a dataset, but due to the lack of knowledge of KNIME, I do not really know what should I deal with my assigned dataset with KNIME to get the results that meets the requirement of this assignment.
A3. Using KNIME or other tools, explore your dataset and identify any outliers, clusters of similar instances, "interesting" attributes and specific values of those attributes. Note that you may need to 'temporarily' recode attributes to numeric or from numeric to nominal. The report includes the corresponding snapshots from the tools and an explanation of what has been identified there.
Present your findings in the assignment report.
B. Data preprocessing
Perform each of the following data preparation tasks (each task applies to the original data) using your choice of tool:
B1. Use the following binning techniques to smooth the values of the Rainfall attribute:
Equi-width binning
Equi-depth binning.
In the assignment report, for each of these techniques, you need to illustrate your steps. In your Excel workbook file place the results in separate columns in the corresponding spreadsheet. Use your judgement in choosing the appropriate number of bins - and justify this in the report.
B2. Use the following techniques to normalise the attribute MaxTemp:
min-max normalization to transform the values onto the range [0.0-1.0].
z-score normalization to transform the values.
The assignment report provides an explanation of each of the applied techniques. In your Excel workbook file place the results in separate columns in the corresponding spreadsheet.
B3. Discretise the WindSpeed3pm attribute into the following categories: Slow Wind, Medium Wind, Fast Wind, and Very Fast Wind. Provide the frequency of each category in your dataset.
The assignment report explains each of the applied techniques. In your Excel workbook file place the results in a separate column in the corresponding spreadsheet.
B4. Binarise the WindDir9am variable [with values "0" or "1"].
The assignment report explains the applied binarisation technique. In your Excel workbook file place the results in separate columns in the corresponding spreadsheet.



Answers :

0

To successfully complete Assignment 2 on data exploration and preparation using KNIME, follow these step-by-step guidelines to address tasks A3 and B:

A3. Data Exploration Using KNIME

1. Load Your Dataset:

  • Open KNIME Analytics Platform.
  • Use the "File Reader" node to import your dataset.

2. Exploring the Dataset:

  • Identify Outliers:

    • Use the "Box Plot" node or "Numeric Outliers" node to detect and visualize outliers.
    • Take snapshots of the plots and include them in your report.
  • Identify Clusters:

    • Use clustering nodes like "k-Means" or "Hierarchical Clustering" to find clusters.
    • Visualize the clusters using the "Scatter Plot" node.
    • Present the clusters with relevant explanations and snapshots in your report.
  • Identify Interesting Attributes and Values:

    • Utilize the "Statistics" node to get a summary of each attribute.
    • Use the "Column Filter" node to select key attributes and inspect them with the "Interactive Table" node.
    • Make sure to document and take snapshots, and provide explanations about any interesting attributes and specific values.

3. Recoding Attributes:

  • Use the "Number To String" node (for converting numeric to nominal) or "String To Number" node (for converting nominal to numeric) when necessary.
  • Document these transformations with screenshots and explanations in the report.

B. Data Preprocessing

B1. Binning Techniques for Rainfall:

Equi-width binning:

  • Use the "Binning" node from the KNIME Extensions.
  • Create equal-width bins for the Rainfall attribute.
  • Select an appropriate number of bins based on the data range and distribution (justify your choice in the report).
  • Export your binned data to an Excel file and save it in separate columns.

Equi-depth binning:

  • Similarly, use the "Binning" node and set it to equal-depth bins using the same Rainfall attribute.
  • Choose the number of bins such that each bin contains roughly the same number of instances.
  • Save these results in the Excel file, in different columns from the equi-width binning results.

B2. Normalization of MaxTemp:

Min-max normalization:

  • Add the "Normalizer" node and set it to min-max normalization for the MaxTemp attribute.
  • Transform the values to the range [0.0-1.0] and save these results in the Excel workbook.

z-score normalization:

  • Use the same "Normalizer" node but select z-score normalization for the MaxTemp attribute.
  • Transform the values using the z-score method and save the results in the Excel workbook in separate columns.

B3. Discretising WindSpeed3pm:

Discretization:

  • Use the "Rule Engine" node.
  • Create rules to categorize WindSpeed3pm into "Slow Wind," "Medium Wind," "Fast Wind," and "Very Fast Wind."
    $WindSpeed3pm$ <= 10 => "Slow Wind"
    $WindSpeed3pm$ > 10 AND $WindSpeed3pm$ <= 20 => "Medium Wind"
    $WindSpeed3pm$ > 20 AND $WindSpeed3pm$ <= 30 => "Fast Wind"
    $WindSpeed3pm$ > 30 => "Very Fast Wind"
    
  • Calculate the frequency of each category using the "GroupBy" node.
  • Document the results and save them in the Excel workbook.

B4. Binarising WindDir9am:

Binarization:

  • Use the "Column Expressions" node.
  • Create an expression to binarize WindDir9am.
    if (column("WindDir9am") == "some_value") {
      1
    } else {
      0
    }
    
  • Export the binarized attribute to the Excel workbook in a separate column.

Report Preparation

For each task, ensure you have included:

  • Snapshots: Provide visual confirmation of steps taken (charts, intermediate tables, etc.).
  • Explanations: Clearly explain the method and findings.
  • Justifications: Rationalize your choice of techniques and parameters.

Save all KNIME workflows and results to include as supporting documentation.

Following these steps will help you systematically explore and preprocess your dataset using KNIME, ensuring you meet the requirements for your assignment. Feel free to reach out if you need further clarification on any step.


Answered By

Joe Evans

Your Answer



XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Write your answer, be as detailed as possible...

Reply as a guest

Required but never shown

Try Now AI powered Content Automation