Joining Adept

I’m so excited to be joining the incredible team at Adept. The clear focus on building tools to augment human capability was what first excited me, and then spending time with the team and understanding the roadmap was what convinced me that I wanted to be part of it.

From their launch blog post:

In practice, we’re building a general system that helps people get things done in front of their computer: a universal collaborator for every knowledge worker. Think of it as an overlay within your computer that works hand-in-hand with you, using the same tools that you do. We all have parts of our job that energize us more than others – with Adept, you’ll be able to focus on the work you most enjoy and ask our model to take on other tasks.

InstructGPT

New alignment paper from OpenAI. A couple of fascinating things in it:

Prompt engineering, one of the most interesting approaches to getting good results from GPT-3, is apparently no longer needed for many tasks with InstructGPT models. Before, you would need to “prime” GPT-3 with a few question/answer examples so it knew what you were getting at. Now it seems just the question will do.

InstructGPT is 100x smaller (!) than GPT-3 (1.3B params vs 175B) but its outputs are rated higher by human labellers for these tasks. It’s somehow nice that the team that kicked off the current GIANT MODELS arms race is also taking steps in a different direction.

Thirdly – this is an academic paper showing good results in aligning language models to certain human requirements. And that’s great! But if you log in to your OpenAI account… the model is right there, launched, and is actually the default model for all requests. OpenAI is very much a product company now, not just a research company.

(Original Twitter thread here).

Weak Labelling

Clear and accessible overview by Raza Habib on a technique that generates labelled data from unlabelled data that sounds almost too good to be true.

When I first heard about weak labelling a little over a year ago, I was initially sceptical. The premise of weak labelling is that you can replace manually annotated data with data created from heuristic rules written by domain experts. This didn’t make any sense to me. If you can create a really good rules based system, then why wouldn’t you just use that system?!

Over the past year, I’ve had my mind completely changed.

The technique works by:

  1. an expert user writing a number of simple rules to classify samples that seem broadly reasonable but don’t have to be 100% correct (e.g. one rule might be “if a headline is in all caps, it is clickbait”, and another might be “if a headline starts with a number, it is clickbait”)
  2. then training a model to see how different rules agree/disagree/interact with each other when they are applied to unlabelled samples.

Then you just throw huge amounts of unlabelled data at this process, and it seems to create a classifier that is almost as good as one created from a carefully curated dataset, with no labelling required.

Given how access to sufficient high-quality labelled data is such a bottleneck to training reliable models, this appears to be a huge deal.

Sabbatical

Starting from… now, I am taking a few months sabbatical from the amazing Normally.

Mainly, I’ll be using the time to hold and look at our new baby, Bella, and do a little travelling if we can. But I’m also going to explore a few things that I’ve been wanting to learn, make and try for a while.

First things on my list:

I’m also going to play A Short Hike again (image above showing Claire, whose chill personality I’m aiming to channel during this time).

I don’t usually do much posting (my last blog post was from 2015!), but for these few months I’ll be trying to share what I’m up to. I would also love to get online coffee or an on-the-phone walk-and-talk to hear about whatever cool project or hobby you’re in to right now. Ping me!

Perfection

The ceramics teacher announced on opening day that he was dividing the class into two groups. All those on the left side of the studio, he said, would be graded solely on the quantity of work they produced, all those on the right solely on its quality. His procedure was simple: on the final day of class he would bring in his bathroom scales and weigh the work of the “quantity” group: fifty pound of pots rated an “A”, forty pounds a “B”, and so on. Those being graded on “quality”, however, needed to produce only one pot – albeit a perfect one – to get an “A”. Well, came grading time and a curious fact emerged: the works of highest quality were all produced by the group being graded for quantity. It seems that while the “quantity” group was busily churning out piles of work - and learning from their mistakes – the “quality” group had sat theorizing about perfection, and in the end had little more to show for their efforts than grandiose theories and a pile of dead clay. (p29)

Art & Fear, David Bayles and Ted Orland

'Prototyping for Tiny Fingers' by Marc Rettig (1994)

A short overview of paper prototyping, in theory and practice, from 1994. Full of insights. Reading through this makes it clear to me that using Keynote as a prototyping tool is really much closer to paper prototyping in spirit than to the software prototypes he compares them against.

“Construct models, not illustrations.” (p1)

“[With a paper prototype] interface designers spend 95% of their time thinking about the design and only 5% thinking about the mechanics of the tool. Software-based tools, no matter how well executed, reverse this ratio.” (p2)

“Spend enough time crafting something and you are likely to fall in love with it. Knowing this, team members may feel reluctant to suggest that their colleague should make drastic changes to the lovely looking, weeks-in-the-making software prototype. They would be less hesitant to suggest redrawing a sketch that took an hour to create.” (p3)

“… no matter how hard you think about it, you aren’t going to start getting it right until you put something in front of actual users and start refining your idea based on their experience with your design.” (p5)

“It can be terribly difficult to keep still while the user spends 10 minutes using all the wrong controls for all the wrong reasons. You will feel a compelling urge to explain the design to your users. Don’t give in.” (p9)

Architects

A Pattern Language, Christopher Alexander

Scrolling

We had to be told how scrolling worked, once.

How to use scroll bars

From the Macintosh System 6 user guide (1988).