Assignment 4: Take-Home Questions

To be handed in when you come to take the exam.

Please show all calculations.


  1. What is stochastic gradient descent (in the context of training of a backpropagation networks)?
    
    
    
    
    
  2. The perceptron has a "hard" nonlinearity, in that the total weighted input to a unit is passed through a hard threshold, Why does a "Multi-Layer Perceptron" or MLP (sometimes called a backpropagation network), as used with the backpropagation algorithm, instead use a soft nonlinearity like ?
    
    
    
    
    
  3. Draw on the below scatterplot of noisy samples of the function three curves, each an attempt to approximate f. The first, dashed, should "underfit" the data. The second, dotted, should fit the data well, ie be your best freehand approximation of f. The third, solid, should "overfit" the data.

    and explain how such underfit and overfit estimates arise in practice.
    
    
    
    
    
    
    
  4. Describe two different ways of avoiding overfitting.
    
    
    
    
    
    
    
    
    
  5. Under what circumstances would "boosting" be useful in practice?
    
    
    
    
    
    
    
    
    
    
  6. Show the separation surface of a "hard" (no slack) LSVM (linear support vector machine, ie a maximum margin classifier applied directly in input space, which is to say with a trivial kernel ) on the 2-D 2-class data shown below. Circle the "support vectors", and diagrammatically show the "margin".

  7. By hand, construct a two-state HMM designed to maximise the probability of producing the below string.
    1010001010100010101010000010101000001010100000000010000000101010101000001000000000100010100000101000001
    
    
    
    
    
    
    
    
    
    
    
    
    
    
  8. Make a two-level decision tree to classify the below data
    attribute 1       attribute 2        class
    ===========       ===========        =====
    yes               yes                X
    yes               no                 X
    no                yes                Y
    no                no                 X
    
    
    
    
    
    
    
    
    
    
    
    
    
    
  9. What is the entropy H[p] of this discrete distribution p:
     p(1) = .2
     p(2) = .5
     p(3) = .29
     p(4) = .01
    
    
    
    
    
    
    
    
  10. Give an example of two distributions p and q for which the KL divergence is highly asymmetric, ie KL[p||q] is low but KL[q||p] is high.
    
    
    
    
    
    
    
    
    
    
    
    
    
  11. There is a new rare fatal disease called syndrome blech which strikes randomly. It is known that 10,000 people, out of a global population of six billion, have blech. When you visit the doctor, she gives you a routine blech blood test. This test has a false positive rate of one in ten thousand, and a false negative rate of zero. The result comes back positive. What is the probability that you actually suffer from blech?