Exam text content

SGN-41007 Pattern Recognition and Machine Learning - 01.03.2017 (Exam, Huttunen)
Exam text content

The text is generated with Optical Image Recognition from the original exam file and it can therefore contain erroneus or incomplete information. For example, mathematical symbols cannot be rendered correctly. The text is mainly used for generating search results.
Original exam
 

 

SGN-41007 Pattern Recognition and Machine Learning
Exam 1.3.2017
Heikki Huttunen

 

> Use of calculator is allowed.

» Use of other materials is not allowed.

> The exam guestions need not be returned after the exam.
b You may answer in English or Finnish.

A. Describe the following terms and concepts by a few sentences. (max. 6 p.)

(a) Rectified linear unit

(b) Linear classifier

(c) Ensemble classifier

(d) Multilabel classifier

(e) Stratified cross-validation
(f) Li regularization

. The Poisson distribution is a discrete probability distribution that expresses the probability
of a number of events x > 0 occurring in a fixed period of time:

 

AN x
e
PG) =
We measure N samples: X9,X1,...,Xn—1 and assume they are Poisson distributed and in-

dependent of each other.

(a) Compute the probability p(x;A) of observing the samples x = (X9, X1,...,Xn—1). (1p)
(b) Compute the natural logarithm of p, i.e., log p(x;N). (1p)

(c) Differentiate the result with respect to A. (2p)

(d) Find the maximum of the function, i.e., the value where x logp(x;A) = 0. (2p)

3. A dataset consists of two classes, containing four samples each. The samples are shown in
Figure 1. The classes are linearly separable, and there are many linear decision boundaries
that classify the training set with 100 % accuracy.

(a) Find one such linear classifier. You can use whatever method you want (except the
LDA), but justify your answer. Present the decision rule for sample x € R? in the
following format:

 

if | something

 

 

 

1,
Class(x) = -
2, otherwise
(b) Find the Linear Discriminant Analysis (LDA) classifier for this data. You can choose
the threshold arbitrarily, but the projection vector has to be the LDA projection.
Present the decision rule in the above format in this case as well.

 
e Class1
x Class2

 

 

-2.0 -15 -10 -0.5 0.0 0.5 10 15 2.0

Figure 1: Training sample of guestion 3.

dh. (a) (3 pts) In the lectures we defined the logistic loss function:

(w) = ) 1n(1 + explynw'xn)). (D
n=0

i. Compute the formula for its gradient 2w).

Ow
ii. There are two alternative strategies for using the gradient.
€ Batch gradient: Compute the gradient from all samples and then apply the
gradient descent rule w (- w — nm.
e Stochastic gradient: Compute the gradient from one sample and then apply
the gradient descent rule. In other words, pretend N = 1 in formula 1.
In the latter case, compute the next estimate for w when x, = [-1,1]' andyln] = 1
and w = [2,1].
(b) (3 pts) Consider the Keras model defined in Listing 1. Inputs are 128 x 128 color
images from 10 categories.

i. Draw a diagram of the network.

ii. Compute the number of parameters for each layer, and their total number over
all layers.

 
5.

Prediction Truelabel
Sample 1 0.8 1
Sample 2 0.5
Sample 3 0.6
Sample 4 0.2

   

oO 0

Table 1: Results on test data for guestion 5a.

Listing 1: ACNN model defined in Keras

 

 

model = Seguential()

w, h = 3, 3
sh = (3, 128, 128)

model.add(Convolution2D (32, w, h, input shape=sh, border mode="same'))
model .add (MaxPooling2D (pool size=(4, 4)))
model .add(Activation(”'relu'))

model .add(Convolution2D (32, w, h, input shape=sh, border mode="same'))
model .add (MaxPooling2D (pool size=(2, 2)))
model.add(Activation('relu'))

model.add(Flatten())
model .add(Dense (100) )

model.add(Activation(”relu'))

model.add (Dense (10, activation = 'softmax'))

 

(6) (3p) A random forest classifier is trained on training data set and the
predict proba method is applied on the test data of only four samples. The predic-
tions and true labels are in Table 1. Draw the receiver operating characteristic curve.
What is the Area Under Curve (AUC) score?

(b) (3p) The following code trains a list of classifiers with a fixed training data set and
estimates their accuracy on a fixed test data set. What are the missing lines of code in
listing 2?

i. Define a list of classifiers: Logistic Regression, SVM and Random Forest.
ii. Insert code for computing the accuracy scores.

 
 

Listing 2: Training and error estimation of classifiers.

 

import numpy as np

from sklearn.neighbors import KNeighborsClassifier

from sklearn.lda import LDA
from sklearn.svm import SVC, Linearsvc

from sklearn.linear model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from data provider import load training data, load test data

from sklearn.metrics import accuracy score

$ The above function has signature: accuracy score(y true, y pred)

$ Load data:
X train, y train
X test, y test

load training data()
load test data()

$ Define classifier list:
classifiers = 4 <insert code 1 here>

4 Test each item in list:

for clf in classifiers:
$ <insert code 2 here>
$ 1) Train clf using the training data
$ 2) Predict target values for the test
f 3) Compute accuracy of prediction

print ("Accuracy: %.2f" % (score)

data

 

 

 
Related Wikipedia pages

 

Inversion of 2 x 2 matrices [edit]

The cofactor eguation listed above yields the following result for 2 x 2 matrices. Inversion of these matrices can be done as follows:l9]

a =[s '] = 1[4—] 1 d +
cd] — detA|-c a] ad-dbel-c al

 

 

 

ROC space edit]

The contingency table can derive several evaluation "metrics" (see infobox). To draw a ROC curve, only the true positive
rate (TPR) and false positive rate (FPR) are needed (as functions of some classifier parameter). The TPR defines how
many correct positive results occur among all positive samples available during the test. FPR, on the other hand, defines
how many incorrect positive results occur among all negative samples available during the test.

A ROC space is defined by FPR and TPR as x and y axes respectively, which depicts relative trade-offs between true
positive (benefits) and false positive (costs). Since TPR is eguivalent to sensitivity and FPR is egual to 1 — specificity, the
ROC graph is sometimes called the sensitivity vs (1 — specificity) plot. Each prediction result or instance of a confusion
matrix represents one point in the ROC space.

 

 

If the entries in the column vector
Xi
X=:
Xn
are random variables, each with finite variance, then the covariance matrix 2 is the matrix whose (i, j) entry is the covariance
Eij = cov(Xi, X;) = EI(Xi— KX; — 4;)]

where
mi = E(X;)
is the expected value of the i th entry in the vector X. In other words,
El(X1 — m): — m)] EX: — m)(X2 — 2)] +++ EX — M)(Xn — Hn)]
EI(X2 — H2)(X1 — 1)] EI(X2 — H2)(X2 — 12)] ++ EU(X2— H2)(Xn — Hn)]
==

El(Xn — Hn)(X1 — M)] El(Xn — Hn)(X2 — 12)] +++ EU(Xn — Hn )(Xn — 4n)]
Exam text content

Exam text content

We use cookies