Let's list the constraints:

  • Basically, people know how to code but don't really know ML
  • But at the same time, 3 days is enough time to learn a notion or understand and improve a template
  • If we don't want it to be a hard to evaluate, the evaluation metric must be clear
  • The topic must be related to AI safety, bonus if the topic is useful for ia safety
  • The topic must be difficult, and must be able to keep strong people busy even for 3 days
  • It must be possible to work on colab
  • bonus if the number of ways to approach the topic is large so that participants do not all implement the same thing
  • bonus if it's not too painful to organize

 

The goal of the hackathon is to find smart students who might be motivated to pursue in technical AI safety


I PayPal 50 euros for any idea I end up using in the hackathon

New Answer
New Comment

5 Answers sorted by

Tomás B.

Sep 05, 2022

20

Maybe give leogao a PM. I know EAI had plans for some simple demonstration projects at one point. 

Caridorc Tergilti

Sep 04, 2022

20

A very simple task, like MNIST or CIFAR classification, but the final score is:

 

 

where "" is a normalization factor that is chosen to make the tradeoff as interesting as possible. This should be correlated to AI safety as a small and/or very sparse model is much more interpretable and thus safer than a large/dense one. You can work on this for a very long, time, trying simple fully connected neural nets, CNNs, resnets, transformers, autoencoders of any kind and so on. If the task looks too easy you might change it imagenet classification or image generation or anything else, depending on the skill level of the contestants.

Is this like "have the hackathon participants do manual neural architecture search and train with L1 loss"?

1Caridorc Tergilti2y
In the simplest possible way to partecipate, yes, but a hackathon is made to elicit imaginative and novel ways to approach the problem (how? I do not know, it is the partecipants' job to find out).

I'm still thinking about this idea. We could try to do the same thing but on Cifar10. I do not know if it would be possible to construct by hand the layers.

On mnist, for a network (LeNet, 60k parameters)with 99 percents accuracy, the crossentropy is 0.05

If we take the formula: CE + lambda log nb non null params

A good lambda is equal to 100. (Equalizing crossentropy and regularization)

In the mnist minimal number of weights competition, we have 99 percents accuracy with 2000 weights. So lambda is equal to 80.

Maybe If we want to stress the importance of sparsity, we can choose a lambda equal to 300.

1Caridorc Tergilti2y
Probably even if not completely by hand, MNIST is so simple that hybrid human-machine optimization could be possible, maybe with a UI where you can see the effect on validation loss in (almost) real time of changing a particular weight with a slider. I do not know if it would be possible to improve the final score by changing the weights one by one. Or maybe the human can use instinctual vision knowledge to improve the convolutional filters. On Cifar this looks very hard to do manually given that the dataset is much harder than Mnist. I think that a too large choice of lambda is better than a too small one because if lambda is too big the results will still be interesting (which model architecture is the best under extremely strong regularization?) while if it is too small you will just get a normal architecture slightly more regularized.

I like the idea.

But this already exist: https://github.com/ruslangrimov/mnist-minimal-model

1Caridorc Tergilti2y
Cool GitHub repository, thanks for the link.

Charlie Steiner

Sep 04, 2022

20

Make an artbreeder that works via RLHF, starting with an MNIST demo.

Make a preference model of a human clicking on images to pick which is better, based on (hidden from the user, but included in the dataset) captions of those images. Maybe implement active learning. Then at the end, show the person what the model thinks are the best images for them from the dataset.

Make an interface that helps you construct adversarial examples in a pretrained image classifier.

Make an interface that lets you pick out neurons in a neural net image classifier, and shows you images from the dataset that are supposed to tell you the semantics of those neurons.

Thank you for your help

RLHF is too complex for people starting in ML? But I'm interested by the link from the mnist demo if you have it?

Preference model : why not, but there is no clear metric. So we cannot easily determine the winner of the Hackathon.

Make an interface: this is a cool project idea. But generally, gradient based methods like The fast gradient sign lethod works very well. I have no clue what would an an adversarial GUI interface look like. So I'm not comfortable with the idea.

Interface to find the image activating the most an image classifier neuron? Cool idea but i think it's too simple.

Caridorc Tergilti

Sep 05, 2022

10

Image to text model with successive refinements:

 

For example, given the image above, the "first" layer of the network outputs: "city", the second one outputs "city with vegetation", the third one "view of a city with vegetation from a balcony", the fourth one "view of a city with skyscrapers on the background and with vegetation from a balcony".

 

This could be done by starting with a blank description and repeating many times a "detailer" network that should add details to a description given an image.

 

This should help interpret-ability and thus safety because you are dividing the very complex task of image description into an iteration of a simpler task, of improving an already existing description. Also it would allow you to see exactly where the model went wrong in the case of wrong outputs. Also it might be possible to share weights between the layers to further simplify the model.

Thank you. This is a good project idea, but there is no clear metric of performance, so it's not a good hackathon idea.

1Caridorc Tergilti2y
The performance can be a weighted average of the final performance and how uniformly we go from totally random to correct. For example if we have 10 refinement models the optimal score in this category can be had when each refinement block reduces the distance from the initial text encoding random vector to the final one by 10% of the original distance each time. This should make sure that the process is in fact gradual, and not that for example, the last two layers do all the work and everything before is just the identity. Also maybe it should not be a linear scale but a logarithmic scale because the final refinements might be harder to do than the initial ones.
1Charbel-Raphaël2y
Okay, this kind of metric could maybe work. The metric could be  sum of the performance of each layer + regularization function of the size of the text proportional to the indice of the layer. I'm not super familiar with those kinds of image to text models. Can you provide an example of a dataset or a GitHub model doing this ?
4Caridorc Tergilti2y
Yes of course: Models: * https://paperswithcode.com/task/image-captioning  Datasets: * Laion 400 millions or other sizes: https://laion.ai/blog/laion-400-open-dataset/ * https://paperswithcode.com/dataset/coco-captions  * Imagenet/any image classification dataset: just treat the labels as text, this should be used sparingly as otherwise the model will learn to just output single words.   Also in the performance metric, the sum of the performance of each layer should probably be weighted to give less importance to the initial layers, otherwise we encourage the models to do as much of the work as possible at the start instead of being gradual.

ThomasJ

Sep 05, 2022

10

Do a literature survey for the latest techniques on detecting if a image/prose text/piece of code is computer-generated or human-generated. Apply it to a new medium (i.e. if it's an article about text, borrow techniques to apply it to images, or vice-versa). 

 

Alternatively, take the opposite approach and show AI safety risks. Can you train a system that looks very accurate, but gives incorrect output on specific examples that you choose during training? Just as one idea, some companies use face recognition as a key part of their security system. Imagine a face recognition system that labels 50 "employees" that are images of faces you pull from the internet, including images of Jeff Bezos. Train that system to correctly classify all the images, but also label anyone wearing a Guy Fawkes mask as Jeff Bezos. Think about how you would audit something like this if a malicious employee handed you a new set of weights and you were put in charge of determining if they should be deployed or not.

Thank you for your help.

The first part is a very interesting project idea. But i don't know how to create a leaderboard with that. I think the fun is significantly higher with a leaderboard.

The second idea is very cool there ks no clear metric: if i understand correctly, people have only to submit a set of adversarial images. But i don't know how to determine the winner?

2ThomasJ2y
Ah, I misinterpreted your question. I thought you were looking for ideas for your team that was participating in the hackation, not as the organizer of the hackation.  In my experience, most hackathons are judged qualitatively, so I wouldn't worry about ideas (mine or others') without a strong metric