# Assignment 3

#### Student ID: *Double click here to fill the Student ID*

#### Name: *Double click here to fill the name*

Firstly, install the following dependencies:

In [None]:
!pip install git+https://github.com/phonchi/playground-data.git -qq
!pip install cleanlab -qq
!pip install scikeras -qq

To ensure reproducibility, please set all the random seeds to 2023:

## Q1: Exploring the TensorFlow playground

[TensorFlow Neural Net Playground](http://playground.tensorflow.org/) is an interactive, web-based visualization tool to facilitate a deeper understanding of neural networks and their underlying concepts. It allows users to experiment with various neural network architectures, hyperparameters, and activation functions in real-time, without extensive coding or expertise in machine learning. The Playground features a simple, user-friendly interface that visually represents the neural network's structure and learning process. Users can adjust the number of layers, neurons, learning rate, regularization techniques, and more while observing how these changes impact the network's performance on synthetic datasets.

In this exercise, we will explore the web interface and replicate the experiment using `Python` (You are free to use `Tensorflow`, `Pytorch` or other libraries to do the exercise).

#### (a) Execute the following steps first: (10%)
1. Choose the circle dataset (top-left dataset under "DATA" panel). 
2. Reduce the hidden layer to only one layer and change the activation function to "ReLU". 
3. Run the model five times. Before each trial, hit the "Reset the network" button to get a new random initialization. (The "Reset the network" button is the circular reset arrow just to the left of the Play button.) 
4. Let each trial run for at least 500 epochs to ensure convergence. 

Make some comments about the role of initialization in this non-convex optimization problem. What is the minimum number of neurons required (Keeping all other parameters unchanged) in this single layer NN to ensure that it almost always converges to global minima (where the test loss is below 0.015)? Finally, paste the convergence results below.

* Note the convergence pictures should include all the settings and the model. An example is available [here](https://drive.google.com/file/d/1zLN-bNtiHNc1x8Ne1-a572nRQRt7AMph/view?usp=sharing) (The setting is the default and you need to change it according to the description above).

> Ans: *double click here to answer the question.*

The convergence results (double click to change the id):

<p align="center">
<img src="https://drive.google.com/uc?id=1zLN-bNtiHNc1x8Ne1-a572nRQRt7AMph" alt="drawing" width="600"/>
</p>

#### (b) Execute the following code to import the circle dataset and plot the data and decision boundary: (10%)

In [None]:
import plygdata as pg
from plygdata.playground import Player

data_noise = 0
validation_data_ratio = 0.5

# Generate data
data_array = pg.generate_data(pg.DatasetType.ClassifyCircleData, data_noise)
X_train, y_train, X_valid, y_valid = pg.split_data(data_array, validation_size=validation_data_ratio)

# Plot the data on the standard graph for Playground
fig, ax = pg.plot_points_with_playground_style(X_train, y_train, X_valid, y_valid, figsize = (6, 6), dpi = 100)
# draw the decision boundary of X1 input (feature)
pg.draw_decision_boundary(fig, ax, node_id=pg.InputType.X1, discretize=False);

Now build the DNN you find in (a) and train the DNN with SGD optimizer, report the final accuracy on the validation set and plot the decision boundary using the following code:

```python
fig, ax = pg.plot_points_with_playground_style(X_train, y_train, X_valid, y_valid, figsize = (6, 6), dpi = 100)
xx = Player.get_boundary_array()
prob = model.predict(xx) # or model(xx)
pg.draw_decision_boundary(fig, ax, node_id=pg.InputType.X1, prob=prob, discretize=False);
```

Finally, plot the learning curve (loss and accuracy vs. epochs) during training. Do your results match (a)?

Hint: The label is `-1` and `1` by default in the playground; you can change them to 0, 1 and use `sigmoid` in the final layer with binary cross entropy as the loss function. In addition, your loss may be slightly higher than the one in the playground if you use binary cross entropy as the loss function. You don't have to deal with this discrepancy in this exercise.

In [None]:
# coding your answer here.

> Ans: *double click here to answer the question.*

#### (c) Execute the following steps first: (10%)
1. Change the dataset to the spiral (bottom-right dataset under "DATA" panel). 
2. Increase the noise level to 50 and leave the training and test set ratio unchanged. 
3. Train the best model you can. Feel free to add or remove layers and neurons. You can also change learning settings like learning rate, regularization rate, activations and batch size. In addition, you can also increase the input features to include interaction terms or others. Try to get the test loss below 0.15. 

How many parameters do you have in your models? Describe the model architecture and the training strategy you use. Finally, paste the convergence results below. 

* You may need to train the model for enough epochs here and use learning rate scheduling manually.

> Ans: *double click here to answer the question.*

The convergence results (double click to change the id):

<p align="center">
<img src="https://drive.google.com/uc?id=1zLN-bNtiHNc1x8Ne1-a572nRQRt7AMph" alt="drawing" width="600"/>
</p>

#### (d) Execute the following code to import the noisy spircal dataset and plot the data and decision boundary: (10%)

In [None]:
data_noise=0.5
validation_data_ratio = 0.5

# Generate data
data_array = pg.generate_data(pg.DatasetType.ClassifySpiralData, data_noise)


X_train, y_train, X_valid, y_valid = pg.split_data(data_array, validation_size=validation_data_ratio)


# Plot the data on the standard graph for Playground
fig, ax = pg.plot_points_with_playground_style(X_train, y_train, X_valid, y_valid, figsize = (6, 6), dpi = 100)
pg.draw_decision_boundary(fig, ax, node_id=pg.InputType.X1, discretize=False);

Now build the DNN you find in (c) and train the DNN with SGD optimizer, report the final accuracy on the validation set and plot the decision boundary using the following code:

```python
fig, ax = pg.plot_points_with_playground_style(X_train, y_train, X_valid, y_valid, figsize = (6, 6), dpi = 100)
xx = Player.get_boundary_array()
prob = model.predict(xx) # or model(xx)
pg.draw_decision_boundary(fig, ax, node_id=pg.InputType.X1, prob=prob, discretize=False);
```


Finally, plot the learning curve during training. Do your results match (c)?

Hint: Your loss may be slightly higher than the one in the playground if you use binary cross entropy as the loss function. You don't have to deal with this discrepancy in this exercise.

In [None]:
# coding your answer here.

> Ans: *double click here to answer the question.*

#### (e) You may find the learning curve you get in (d) is noisy and require many epochs to converge. Try to improve the DNN in (d) by changing the network architecture, learning rate schedule, or optimizer so that the learning curve becomes smoother and converges faster. (10%)

Finally, plot the learning curve during training and draw the decision boundary using the following code:

```python
fig, ax = pg.plot_points_with_playground_style(X_train, y_train, X_valid, y_valid, figsize = (6, 6), dpi = 100)
xx = Player.get_boundary_array()
prob = model.predict(xx) # or model(xx)
pg.draw_decision_boundary(fig, ax, node_id=pg.InputType.X1, prob=prob, discretize=False);
```

In [None]:
# coding your answer here.

## Q2 Explore the CNN explainer

[CNN Explainer](https://poloclub.github.io/cnn-explainer/) is an interactive, open-source visualization tool designed to provide a comprehensive understanding of Convolutional Neural Networks (CNNs). The explainer aims to demystify the inner workings of CNNs through visualizations and step-by-step explanations. The platform offers a guided walkthrough of the building blocks of CNNs, including convolutional layers, activation functions, pooling layers, and fully connected layers. It allows users to interactively explore the components, visualize feature maps, and understand the effects of different hyperparameters on the network's performance. 

In this exercise, we will explore the web interface and replicate the experiment using `Python` (You are free to use `Tensorflow`, `Pytorch` or other libraries to do the exercise).

#### (a) Firstly, explore the CNN explainer and answer the following questions: (10%)

1. What is the shape of the input and output of the network?
2. What are the kernel size, stride, padding, and number of filters used in all the conv layer?
3. What are the kernel size, stride, and number of filters used in all the pooling layer?
4. How many parameters are used in the final dense layer?

> Ans: *double click here to answer the question.*

#### (b) Based on the observation in (a), build the same CNN using `Python` and report the total number of parameters and architecture using `summary()`. Remember to rescale or normalize the input before feeding it into the network. (10%)

In [None]:
# coding your answer here.

#### (c) Download the dataset from our course website and load the training, validation and testing dataset from the folders `class_10_train`, `class_10_val/val_images` and `class_10_val/test_images`, respectively. Remember to resize the images to $64 \times 64$ and set the batch size to 32. Finally, draw `9` random samples from the training set and plot them. (10%)

In [None]:
# coding your answer here.

#### (d) We are going to use the data loaded in (c) to train the CNN in (b). First, add the callback to monitor the validation loss and save the best model base on the validation loss. Secondly, train the model you build in (b) with the `Adam` optimizer for 50 epochs. Thirdly, plot the learning curve after training. Finally, reload the best model and report the accuracy on the test set. (10%)

In [None]:
# coding your answer here.

#### (e) Looking at the learning curves, you can see that the model is overfitting. Try to add a data augmentation layer for the model in (b) as follows: (10%)

* Applies random horizontal flipping 
* Rotates the input images by a random value in the range `[â€“36 degrees, +36 degrees]`
* Zooms in or out of the image by a random factor in the range `[-20%, +20%]`
* Randomly choose a location to crop images down to a target size `[56, 56]`
* Randomly adjust the contrast of images so that the resulting images are `[0.85, 1.15]` brighter or darker than the original one

Fit your model for enough epochs (75, for instance), compare its performance and learning curves with the previous model in (d), and comment on the results. Finally, report the accuracy of the test set. Remember to reload the best model before the test.

In [None]:
# coding your answer here.

> Ans: *double click here to answer the question.*

#### (f) Use `cleanlab` to find the possible label issue in the validation set using the mode you build in (b). You can follow the below procedure: (10%)

1. Wrap the model with `scikeras` or `sktorch`. Set the optimizer, epochs and batch size to `Adam`, `30` and `32`, respectively.
2. Extract the image data and labels from the validation set into `X` and `y` `NumPy` arrays.
3. Get the out-of-sample prediction probabilities using 
```python
pred_probs = cross_val_predict(
    clf,
    X,
    y,
    cv=3,
    method="predict_proba",
)
```
4. Find the top 9 possible label issues using `find_label_issues()` and plot them using `plot_examples()` provided below.

Comment on your results.

Hint: You can also inspect the `class_dict.json` file to see the corresponding class for each label.

In [None]:
def plot_examples(id_iter, nrows=1, ncols=1):
    for count, id in enumerate(id_iter):
        plt.subplot(nrows, ncols, count + 1)
        plt.imshow(X[id].reshape(64, 64, 3).astype("uint8"))
        plt.title(f"id: {id} \n label: {labels[id]}")
        plt.axis("off")

    plt.tight_layout(h_pad=2.0)

In [None]:
# coding your answer here.

> Ans: *double click here to answer the question.*