TalkRL: The Reinforcement Learning Podcast | Transcript: Julian Togelius

Julian Togelius

July 25, 2023 / 40:04/E44

Robin: 00:05

TalkRL Podcast is all reinforcement learning all the time, featuring brilliant guests, both research and applied. Join the conversation on Twitter at talkRL podcast. I'm your host, Robin Chohan. Today, I'm very glad to welcome Julian Togelius. Julian's an associate professor of computer science and engineering at NYU, and he's cofounder and research director at Model AI.

Robin: 00:32

Welcome, Julian.

Julian: 00:33

Hey. Thank you, Robin. I'm glad to be here.

Robin: 00:35

I encountered your work. Actually, I encountered you on Twitter a while ago, but more recently, I saw you wrote a very interesting paper aimed at AI researchers on ways to cope with the recent frantic pace in AI. The paper was called Choose Your Weapon, Survival Strategies for Depressed AI Academics, and that's with yourself as first author. Can you tell us a little bit more about this paper? And, what are the first of all, what are the feelings behind this, and what led you to to write a paper like this?

Julian: 01:06

Yeah. So basically, it started with some, discussions between me and my good friend and frequent collaborator and co founder, Jorgos Janakakis, that we were reflecting over what's happening now with the insane amounts of money being poured into AI research by very large industry players, and how a lot of the progress that we've seen is really sort of the kind of work that is being done could could only be done if you have research budgets that allow you to spend 1,000,000 of dollars of compute. And what is a poor every guy every guy to do, basically, someone who's just an academic researcher. And this is kind of ironic because, both me and Jorgos, we're pretty well off. I'm at this famous university and have a decent sized lab.

Julian: 02:03

Georges has his whole own research institute and all these European grant money. And, we also both cofounded, an AI startup, that is trying to productize some of the stuff we've been working on for a long time, that that being Model AI. So, you know, you would look at us and basically say that, no. No. These are not regular guys.

Julian: 02:23

These guys have resources. And still we feel that, like, there's so many things we'd love to do that we just cannot do because we're not Google, Meta, or OpenAI, or, you know, bankrolled by, I don't know, Saudi oil princes or something like this. And if you're listening and you're a Saudi oil prince and want to give us 1,000,000 of dollars, hey. I'm listening. Anyway, so, you know, we are experiencing this, and I know that this is a very, very widespread feeling in the community that what can you possibly do, with, like, normal person, normal academic research lab and strategies.

Julian: 03:05

And we started basically just listing what you can do. And many people have picked up that the very first strategy we have we have a dozen or so strategies in there is give up. And this is give up not in the sense of, like, you know, never do research again or never sort of, you know, leave the field in disgust or something like this. It's more that you can actually be an academic researcher and not want to make, like, huge impact and basically publish a few small papers now and then, especially if you're lucky enough to be in a position with job security. You don't have to change the world.

Julian: 03:48

On the other hand, most of us got into this because we wanted to do amazing, great stuff that changed the world. And then the rest of the paper goes on about how can you do this? How can you sort of basically have an impact even if you don't have resources? And the observations, of course, that if you go back to even even like 10 years back, certainly 15 years back, a lot of the research that did make a big impact on the field was done by, like, you know, 1 or 2 or 3 people with essentially, a desktop computer, like regular thing. There wasn't even infrastructure around and maybe not datasets around that would require or make use of this kind of huge compute capacity.

Julian: 04:36

So, but that is sadly not the situation anymore. Maybe it's not sad. Maybe it's great that now we can actually make big science with AI. But yeah. So there we go.

Julian: 04:48

Then then then we started listing these things. And, we wrote it up. We ran it through some of the people in our internal teams, and, got some feedback. And then we wrote it up and put it online there and got an enormous response because, we clearly struck a nerve with thousands of people judging for the responses that feel exactly like this. So that was pretty interesting.

Julian: 05:14

I don't know if it's good or bad.

Robin: 05:16

Was this partly brought on by, the advent of these these massive, large language models? I mean, definitely, this has been a meme in the field in academia for years in terms of models getting larger and larger, but, it seems to me the, the scale hypothesis is kind of one out in a certain sense.

Julian: 05:37

Yeah.

Robin: 05:38

Is it really about that, or is it is it AI as a whole, deep learning as a whole? And is there an RL angle here?

Julian: 05:45

I think it is deep learning as a whole, for sure. And Richard Sutton probably phrased it best in his bitter pill. Basically, we take kind of the bitter pill that have page of text, as a starting point and look at like, okay. The bitter pill is true. Like, scaling works.

Julian: 06:05

So the better lesson is true. Scaling works, and your clever things, don't matter much if they don't also scale. Otherwise, we'll scale past your clever things. I think it's true for deep learning as a whole. RL is a little bit different because RL isn't dependent on huge datasets as much as it is dependent on huge compute.

Julian: 06:26

Or in other words, it's even more dependent on huge compute. One outcome of this is that, basically, it's inapproachable in a slightly different way than huge unsupervised or, self supervised or supervised learning. K. Because, you can build an environment and then train a lot on that, but then you have a huge computer and resources needed to actually train on it. So it did it very much applies to RL, but, it is we're really talking about all of deep learning as it is right now.

Robin: 06:59

So do you see, like, big compute experiments as kind of almost like a low hanging fruit at this point if for the people who have those those, resources and and and and everyone else has to look higher up in the tree?

Julian: 07:13

That's a good way of putting it. Let me say that I am certainly not alone in having, like, a long list of, like, more or less formulated IDs that I was thinking about, like, for years that I just could not do. And then I see someone else, basically publishing a paper showing that it works because they put $1,000,000 of compute into it. And of course, it feels in one way unfair, like in a yeah. I had this ID.

Julian: 07:42

Obviously, I was probably not alone with SID. Probably hundreds of people had it. But we couldn't do it because we did not have that amount of research or that amount of resources. So, yes, to some extent, it's low hanging fruit in the sense that, it gets much easier to do research at this scale. Of course, you have other problems.

Julian: 08:04

You have the problem of actually using this money efficiently in terms of you need to build an organization that an organization that basically, can efficiently use the money and you could you need to build a compute infrastructure, which is very hard. I mean, there's a lot of good old fashioned engineering that goes into this, not just the machine learning trickery. And one of our points in our paper is that the typical academic, organization is not set up for this. Because if you need to spend a $1,000,000 compute well, you probably have a team of 10 people. And not every one of these is a researcher.

Julian: 08:46

Various people are just like support staff, and managing people, building data pipelines, and so on. Now in a university environment, you have mostly hired PhD students, sometimes postdocs. Postdocs are only around for, like, maybe 2 years. PhD students are maybe around 4 to 5 years, 3 in some countries. And the PhD students need to have their own projects that they can write up their own thesis about so they can graduate.

Julian: 09:14

And this whole structure, even if you had the money in academia, this structure makes it really, really hard to do this kind of research. Whereas if you're deep mind, you can have this kind of long term financing, and you can have a diverse team of complementary skills where, no one is worried about having to write up. Basically, go up for a proposal and then write up a thesis and graduate and and then disappear from the team. So it's a very different structure. And this is just something intrinsic to how universities work, for better or worse.

Julian: 09:53

So this is like it's not just about money. It's also about, like, the the organizational aspects of it.

Robin: 09:59

So we recently featured Jacob Forrester, a professor at Oxford, and he had said that on in our interview, he had said that, you know, industry is a place for exploitation of ideas and academia as a place for exploration, And that it's a natural cycle that something that comes out of academia will then get handed, will will will then get carried on by industry, to exploit and that if that's happened I think the implication is if that's happened with some of this large compute AI, then that's a sign for academia to explore other other new things, which I definitely do. You touched on that, strategies like that in the paper in terms of looking somewhere else. Is,

Julian: 10:50

I I I I've gone around and said similar things a lot, and I want to believe it's true. I think it's probably true. Quite a few of the strategies we list are about this. How can you do something that is not done by the large and rich companies, in one way or another? And, one way is to, like, look at, like, new problems, problems that people aren't, exploring.

Julian: 11:14

So they need to be, like, problems that maybe people don't think they're important. So, I mean, my academic history, and Jorgos is one, who I co wrote a paper with, we started out in video games back when back back when nobody thought it was anything that, like, a serious researcher should do. But we just didn't care. We wanted to work in this, so we did. And throughout my career, I had so many people basically approach me and said that, wow.

Julian: 11:42

Can you really work at this? But when are you gonna do serious research? What what this is not real AI, or this is not like this is not good for anyone. Why are you not working on cars or robotics or something like this? And I just told them, that they I told them they were just jealous.

Julian: 12:03

And then I went on and did what I did what I did. Of course, around 2015 or so, we saw, like, this big upshot of interest with the, Atari environment and so on. And I had this weird feeling of being feeling slightly vindicated and bypassed at the same time, I guess. So, probably even more true for other people in our community of people working on video games. I also worked on things in video games, not that people didn't care about, not just like playing them, but also, like, generating content for them, automatically assigning them, modeling players, things like this.

Julian: 12:46

And, of course, you should take this strategy even further. Look at problems that nobody seems to care about, that are not serious or not sexy or not kosher in one way or another. And it could be different applications, so it could be different approach to the applications. Or, like, it could be something that is, that nobody wanted just, and and just and that's why you work on it, essentially. So, basically, working on new problems, new applications that are not cool is one thing.

Julian: 13:18

Also using methods that shouldn't work. So a lot of the progress I mean, the most classic example here is that all of deep learning exists because a bunch of people kept working on deep learning, keep kept working on what what would become deep learning. So, basically, gradient descent training of neural networks even when the consensus in the machine learning community was very much against it. Like, support vector machines are way better, they have way better theoretical basis than, gradient descent in multilayer perceptrons. Yet, Jan de Koon, Jurgen Swietuber, Jeff Hinton, Josh Banjo, and a couple hundred other peoples, kept working on this because they believed in it.

Julian: 14:07

And in the end, they were right, which which is great. You know? And you should probably do the same because say that you work at an organization that can run $1,000,000 compute experiments. You probably don't throw a $1,000,000 at anything. You probably only do it if you have a good, reason to believe that it's gonna work.

Julian: 14:32

Now if you are a lonely academic somewhere, maybe you should go for things that shouldn't work, things that go against the received wisdom, Things that things that are likely to fail, but fail in an interesting way, because you can. And, you know, things that when they fail, will teach you something. And then do we give you an even stranger idea for something to try, And do things that have no basis in theory or that goes against theory, goes against the received wisdom, and and just basically move ahead with it. And then yet another strategy for the exploration is, and now we're getting into what some people consider really shaky ground. Here is where some people would want to distance themselves from from me and Jurgis, is that do things that have somewhat bad optics.

Julian: 15:24

We're in an era where AI research has become maybe politicized, maybe not quite politicized yet, but, people are getting very ethics nervous. And first of all, I'm not gonna say that you should do anything you think is unethical. You definitely should not do anything that goes against your personal ethics. But chances are that if you are a person in the world that cares about AI, your particular ethics is not the same as that, which infuses, the boardroom of, and upper management of, a very rich company that is located on in New York or San Francisco area, which is mostly white people with a western background, etcetera. We're seeing this very, very specific idea of what ethics is that is kind of, this very specific idea that is steering what is what people are doing in AI.

Julian: 16:33

So take an example of, like, what I'm talking about here. Back in 2017, me and 2 of my PhD students, who was mostly driven by Ahmed Khalifa and Gabriela Baros, we were looking into text generation, like, very primitive text generation by today's standards, LSDMs and stuff, and automatic autocomplete systems, basically. The things that would write you help you write, mails and SMSes and so on. And they're sort of the reason the way they would not let you say fuck. They would never suggest that for you.

Julian: 17:05

And we're like, why you want to do that? I use that in my language all the time. And then we were all very influenced by Chuck Tingle, the, fascinating author of, how do you best characterize this? Absurd sci fi, political satire, gay erotica. It has a lot of, like, dinosaurs having sex with each other and so on, in or people having sex with unicorns and so on.

Julian: 17:36

Is very gay. And we decided that what if we train text generation methods on his books and his very idiosyncratic language and ex and explore what would that be, if if if you use it as the, as an autocomplete system. Basically, let's throw away this assumption that you want to write neutral English, whatever that is, and, sort of clean English. And you instead want to write whatever it is Chuck Tingle is writing. And I love that little paper with it.

Julian: 18:09

I love it even more because we had various people, including including people in PR that basically said, don't release this. And I'm like, fuck you. I'm gonna release this. And this is the kind of thing that, this is the kind of thing that I think you maybe should be doing if you want to do things that the larger rich companies are not doing. Look at look at what exactly they wouldn't do and think about why wouldn't they do it?

Julian: 18:39

Why wouldn't why would they seem seem to be a completely unreasonable thing to do and then explore it?

Robin: 18:44

That example is interesting because it it seems to fit so neatly into Internet culture. I mean, that type of text is is not not unusual on the Internet, and however, these large language models that are supposedly compressing the Internet, they're also done sanitizing it and removing that Internet culture component, or a large large part of it. So that's that's interesting.

Julian: 19:06

Mate, I I I'm of multiple minds towards, attempts at sanitizing and aligning language models. It's clearly valuable, but it also risks just erasing a lot of stuff.

Robin: 19:17

So do you feel like the, the the the tone with these strategies is one of pessimism? It's like, oh, I I can't believe we have to, you know, resort to these things, or is there an optimistic, way of looking at this, of saying, let's let's turn this into some kind of opportunity? Do you see it that way, or is it or is it, what is the feeling, really?

Julian: 19:35

The feeling is kind of resisting, riotous, mischievous, like, we're facing an ordinary academic researcher that has been effectively cut out from doing a certain kind of research, needs to basically develop resistance strategies. You need to become someone else that does some other research. And this is yeah. And and and and an element of, like, spite or resistance in it. So it's not quite depressed.

Julian: 20:17

It's not quite optimistic either.

Robin: 20:19

And then in terms of RL specifically, I I I guess there's some other approaches that are more technical approaches to let small labs get that that performance. Like, I was thinking things like purely JAX based RL that can do the Right. RL loop much faster or using using agents like Dreamer version 3 that can, be much more sample efficient in terms of of of of getting to high performance just dramatically faster than the than the legacy algorithm. So that that might be one other angle that, RL people specifically could take.

Julian: 20:52

Yeah. So so I I I 100% agree. I think both of what you mentioned here is really exciting. I think vectorization of environments is very cool. I it is something I wanna do more with one of our current projects.

Julian: 21:10

We are currently thinking through how could we vectorize the whole thing to enable extreme multitasking. And Dreamer is also an interesting thing. There is, of course, the problem that if this works well for us, it works even better if you have a $1,000,000 compute budget and can distribute it not over a certain amount of, CPU cores or, like, over, like, a small number of GPUs, But in as you can sort of distribute it over a vast cluster. So, it is cool. It is interesting, and I think we should do it.

Julian: 21:47

But, I also think that, the relative, advantages of having a very large, very large computer budget and a large team doesn't go away. One thing that might be even more interesting is if RL models go the way of LLMs right now, where you open source them and you fine tune them, and you could basically see that you have open source models of, acting in certain environment that are just, like, spread around on web pages and GitHub and BitTorrent and whatnot when people do their own work in tweaking them and, and fine tuning them. That would be a very, a good future. I think I I'm very bullish on the open source, open source AI future, and I hope that the gatekeepers that be Sam Altman and whatever, failed miserably in shutting it down.

Robin: 22:45

Uh-huh. Okay. So let's move on to your start up. You have a start up named, Model AI. You mentioned that you're cofounder and research director there.

Robin: 22:54

Can you tell us about, Model AI? What does Model AI do, and what do you do there?

Julian: 22:59

Sure. Model AI has been an amazing journey that I've learned a lot from, and and we have built a lot of stuff that I'm very proud of and that I also hope is gonna make us money at some point. But, basically, it started with a number of us that have been working in AI for games for a while. And, me and several of my colleagues and my student and so on, this thinking about, how could we commercialize the work we've been doing in AI for Games? We started this company.

Julian: 23:33

We got funding, and then we started looking into what would actually work. We worked closely with people in the game industry, like, well, with game developers that had specific needs, and we looked into how could we solve this in a way that's repeatable and productizable. And one thing we found out was that a lot of stuff that we wanted to do, we could do this. We could build level generators. We worked with King on level generators for Candy Crush, for example.

Julian: 24:01

We we worked with several other companies, I can't really talk about, on coming up with, bots, multiplayer bots for their games and opponents and sidekicks and a number of different things. But because each game differs so much from other games, we faced a real challenge into making a product that could actually be reusable and resellable to others. And as a start up that is venture funded that is on a that needs to be on a growth curve, you can't really you're not planning to build a consultancy. You're basically growing or dying. So we needed to find something we could build a build a good product out of.

Julian: 24:48

And what we eventually found after trying a myriad things, is game testing. We game testing is not the only thing Model AI does, but that is currently our core product, our kind of our kind of, the the thing that we hope will make us money in the short to medium term while we're also working in other things. And the game testing, that we are doing is for an RN podcast, this might sound a little bit depressive. But, basically, the game testing we're doing is mostly based on exploration bots. We are exploring various ways that, we can enhance this exploration bot, and we have some machine learning components in there.

Julian: 25:36

But at the core of it is something like the non reinforcement learning parts of Go Explore. It's not exactly that, but that is that is a relatively close, analog of what we're doing. The good part of this is that this actually works. So you can use this to explore game spaces and find various bugs, you know, frame rate drops, falling through the environment, crashes, glitches of various kinds. There's a whole bunch of things, and it generates a report for the game developer, and they can prioritize and group their various bugs and go in and fix them.

Julian: 26:17

Given how much, of game development budget is actually game testing, this turns out to be really valuable for our customers. We're looking now how to explore this to sort of, into more intelligent testing strategies, into team player bots, and so on. But, basically, the more intelligent thing you the thing you try to do is, in a for some value of intelligence, the more you run into the problem that video games are very different from each other, and and the more you sort of need to tailor specific solutions to a specific game.

Robin: 26:55

Can you talk about what genres of games, you're working with?

Julian: 26:59

We are currently focusing on games with first person or close to first person perspective in three-dimensional environments. Within that, there's a huge variation, FPSs, post and games, theme based games, single player games, and so on. But we are our main efforts are into the 3 d environments with close to first person views, because that's how we set things up. But I can tell you without without breaking any NDAs that we are working with at least one famous first person shooter at the moment, and, one famous, sort of racing game. Do you

Robin: 27:45

see a more of a role for RL going forward in the future, or are you seeing that other methods really outperform RL for this this type of task, in general?

Julian: 27:55

The problem with reinforcement learning and game testing is the same as the problem with reinforcement learning in a billion other contexts is that it, overfits. It overfits really, really, really badly. This is something we all want to get away from. But, generally, reinforcement learning, when given the chance, will learn a very brittle strategy that only works for one particular game, and one particular level in that game, and one particular angle towards that level and one particular color scheme, and everything gets very, very, very particular. It is a huge challenge to you to create reinforcement learning agents that have some degree of generalization.

Julian: 28:37

And if you want, if you want a product that can work across multiple games, you need some of that generalization. Otherwise, you have to do the retraining every time you want to test something. So I do think there's a future here, but we need to get, further towards something like reinforcement learning foundation models, that can actually generalize well enough. I don't think we're there yet. I think there is interesting work in towards that, but I don't think we're there yet.

Julian: 29:10

And this is why our product mostly relies on exploration bots right now.

Robin: 29:15

Definitely seems like an exploration heavy task, which I don't know if it's fair to say, but it's my sense is that, contemporary RL hasn't really solved the exploration piece so much as the exploitations aside. No. And so that makes sense that you're pointing to Go Explore. You know, we had the the author of of Go Explore on on the show, not long ago, Jeff Klune. That makes total sense because in in when testing, you're gonna wanna explore all the nooks and crannies, places you haven't seen before.

Robin: 29:47

Deeparral's not that great at that.

Julian: 29:50

No. I just wanna say that I'm a big fan of Jeff's work. We know each other fairly well and go way back. So, another piece of work from him that, that, we we that influenced a lot of our thinking is what's called the video pretraining, where he pretrained Minecraft agents on large amounts of video data by also learning an inverse, dynamics model that basically learns actions or predicts actions from, from video transitions. And I think this kind of thing, this is going some way towards the foundation model RL that we're talking about.

Julian: 30:31

And I think it's a very interesting direction to go. Unfortunately, one that requires quite massive compute.

Robin: 30:38

Yeah. Especially to create that first pretrained model, just the the, I guess, the analogy to GPT here in in in terms of video. And then I guess the idea is the rest of us can, cheaply fine tune just the way the rest of us can cheaply fine tune,

Julian: 30:52

llama or something. Exactly. We might very well have this thing where, where, basically, we have, like, a generic and or generic model for playing lots lots of 3 d games, and you can't find unity in particular games. This is something we at model would love to be able to do. We're not there yet.

Robin: 31:12

So I noticed you have a number of papers on procedural content generation and some of them involving RL. Can you tell us a little bit more about this, this line of work?

Julian: 31:21

Yeah. So I've been working in procedural content generation for a long time. I started back in 2006 or so. We're trying to sort of basically, turn the problem of learning to play games around. And instead of learning to play games, how could you learn to generate games?

Julian: 31:37

And for a long time, I was mostly working with evolutionary computation and did some really interesting stuff there, I think. And we also explored constraint, satisfaction, algorithms, grammars, a lot of things. And we recently started working with RL here. And RL is interesting because evolutionary computation when it comes to evolving game levels, it's quite computationally expensive. Now reinforcement learning is even more computationally expensive, but you're kind of front loading the computation.

Julian: 32:07

1st, you're spending all this, all this amount of time on training a level generator, and then you have a level generator that can generate levels really, really fast. So you're sort of changing when and where you are doing the computational effort. And we were 1st, I was a little bit skeptical that it would actually work, but it didn't turn out to do to to work, actually. So the thing is, instead of looking at an agent that plays a game, you're thinking an agent that generates a game level, or maybe generates other aspects of the game. And you basically give it rewards just the same.

Julian: 32:42

So in our main formulation, what currently works best, we're giving it dense rewards, the agents, for for generating good levels. So it gets whenever it improves the level in terms of some metrics, it, will reward it. And when it makes it worse, we we punish it essentially. And the problem here is accurately yet rapidly calculating level quality, which is a very, very hard problem. But for many cancer levels, offer some kinds of problems.

Julian: 33:20

So some kind of aspects of level quality, you can actually do this. And what you would do here is looking at the, connectedness between different parts of the level, path lengths, existence of various things in the level, like good level must have a spawn point. It must have a path to the exit, it must have treasures, it must have monsters, and then maybe the treasures needs to be behind the monsters, etcetera. And then you train an agent to do this, and we can go on and create things. You can also give the agents conditional inputs so that then you train an agent that can create levels of different types.

Julian: 33:56

Like, for example, EC and heart levels or, levels with the presence or absence of particular kind of enemies, levels where the exit is close to the entrance or not, and so on. And this works pretty well. Even more interestingly, because it's really hard with a reward to catch every aspect of level quality. So even more interestingly, you can combine agents later on. You can basically create a level that, or create an 8 train an agent that generates one kind of level.

Julian: 34:31

And then you train another agent that generates continuous working in the level, but with another set of, with another set of goals and so on. Now do these sort of reward, these evaluation functions capture accurately how good the level really is for humans? No. It doesn't. But it certainly does it does get get us a part of the way there.

Julian: 35:03

And maybe this is a way that you can you can then build on as a human, sort of keep editing this.

Robin: 35:09

Are are the metrics that are used for level design, are they kind of hand designed, or do you see some some way around that? Like, I guess, in in large language models, there's this notion of a, of a reward model that tries to capture people's preferences. And I I I would imagine that's quite challenging in a game game environment, but if if you had that, maybe then that could drive your your your level designer.

Julian: 35:36

Yeah. It's funny that you mentioned it because we're working in that right now. Basically, the equivalent of RLHF for level design.

Robin: 35:44

Oh, cool.

Julian: 35:45

It's tricky. It's it's it's it it is actually quite hard. Another thing you can do is learn from examples of good level. So we have this thing called, the path of destruction, which basically takes good levels, destroys them in a myriad different ways, and learns from the paths, like, learns from the trajectories from from from destroyed levels to good levels. So it's little bit RL like.

Julian: 36:13

It's also a little bit like a diffusion model, but, in a sequential and discrete way. So it's also quite unlike a diffusion model. And this also works pretty well. We've been able to do this for game levels, for LEGO, structures, and so on. One interesting thing here that you can do is that for people who don't care about coming up with levels that are good for humans or interesting LEGO structures, I mean, most people actually do care.

Julian: 36:44

There is the other there's the kind of meta use of this in generating new levels so that they would be good for training new bots that play the levels, basically. And this goes back to what we talked about before, being able to generalize reinforcement learning. So some of the work we did, we used these level generators to generate an infinite supply of levels, that gradually get harder so that you can so that you can train better game playing agents. So it's basically and then, of course, you can use proceed reinforcement learning for this. So you can basically train level generating agents that generate levels that help you train game playing agents.

Julian: 37:32

And and you could probably go further with with this virtuous loop, I think.

Robin: 37:38

You know, I would say that sounds so meta, But now with, Facebook's new name, I I don't even like using that phrase.

Julian: 37:44

Right. Yeah. That's cool. I think it's yeah. Yeah.

Julian: 37:48

No. It's cool. I I there's a lot of interesting stuff to work in on there. Some people call it open ended learning, which is a good good good term for it, I guess. But, you can basically use reinforcement learning in every stage of this, which is not saying that reinforcement learning is necessarily the best way of doing this.

Julian: 38:07

I'm still my heart still beats for evolutionary algorithms, and I think there's a lot of good, ways of combining combining these methods. But but but I'm definitely very much into this sort of, you know, level generators and level generators and game playing or acting in an environment agents, and how they combine with each other.

Robin: 38:32

So besides your own work, are there other hap other things happening in RL lately that's that you find, interesting?

Julian: 38:38

Depends on what you mean by RL, really. But, yes, I think Dreamer is really interesting. I think there is, Sergei Levine's work on actually getting offline q learning to, really work well is, kind of boring interesting because it's clearly clearly very, very useful. The kind of things he does or his team does to make it work well are very simple in themselves, and, kind of anticlimactic. But the performance they get out of offline q learning is very cool.

Julian: 39:17

And this is the kind of things that we probably need to build these reinforcement learning foundation models in the future. That otherwise, I look a lot into what's happening in the field that might be called open ended learning. And I am very close to following what's happening in the quality diversity algorithms, and interesting in the use of of quality diversity for reinforcement planning, for example. So that's that's where I go look, mostly.

Robin: 39:46

Awesome. Julian Tagelios, thank you so much for sharing your insight with the TalkRL audience today. Thank you, Julian.

Julian: 39:52

Thank you, Robin. It's been a pleasure being here.

Creators and Guests

Host

Robin Ranjit Singh Chauhan

🌱 Head of Eng @AgFunder 🧠 AI:Reinforcement Learning/ML/DL/NLP🎙️Host @TalkRLPodcast 💳 ex-@Microsoft ecomm PgmMgr 🤖 @UWaterloo CompEng 🇨🇦 🇮🇳

Broadcast by

Creators and Guests

headphones Listen Anywhere

Listen Anywhere