A Look through the Window
By Mar Ginot Blanco and Philipp Gross
Last December we published our first research project, Machine Windows: Views from the Latent Space. The project asked: How can we see more, not less, of the visual material neural networks generate? How can we build a window onto the machine’s inner logic?
Usually, these complex computational processes are intentionally hidden from view: human types in a string of words, computer miraculously spits out an image. We wanted to chip away at that mystical process, and find new ways to explore and visually interpret some of the inner-workings of image-generating neural networks. Merging two of our core disciplines, machine learning and visual design, we developed two explorative interfaces, and an essay on our thinking.
The project was anchored by a three-month residency with Claartje Barkhof, a Dutch machine learning researcher with a background in art and design. Our internal team primarily consisted of Amelie Dinh and Iris Cuppen on the research side, Mar Ginot Blanco on the design side, and Philipp Gross, our head of data science. Below, Mar and Philipp talk about how they worked across these different divides and managed the exploratory process.
Can you start by describing the role you both played in the project?
I was supposed to be a facilitator for Claartje’s data science work. The plan was to build a prototype from her work, and I’d be like a translator for the team—keeping in mind our requirements for this work. But in the end, that wasn't really necessary because Claartje has a strong background in design as well. We did some workshops, and there was a lot of back and forth between the design team and Claartje, and me. So I think it was more about picking up the things Claartje built and supporting her while making sure we didn’t end up doing something we couldn’t really use from a practical perspective. After a couple of months, she finished the residency, and then I continued her work on data science-related development tasks.
In my case, I welcomed Claartje at the beginning and shared our interests from a design point of view. We’d been speaking internally about how we could merge tech and design in more of our work. That was our initial conversation, and I think we achieved a lot in our first workshop in Bonn, which included Philipp, Gunnar, Iris, Claartje, and me. We used that discussion to frame our interests for the project; I’d say that we gave Claartje some of this framing, but in the end, she took that and kind of made it her own.
Mar, had you worked with machine learning previously? How did you tackle this new subject area?
I hadn’t specifically worked on a machine learning project before. But Claartje was really good at explaining the basic concepts and the types of machine tools we’d be using. She drew a lot of diagrams, dots and lines and arrows pointing in different directions, and she’d repeat things to me over and over again. We’d sit together with a pen and paper when we were making the prototypes and trying to connect things. She couldn’t really use numbers with me, so she had to use this visual approach. Then with Philipp we didn’t speak so much while creating the tools but more as we were closing the project and honing in on what we were trying to communicate—and making sure that was embedded in the tools. This was really important to us—clearly communicating the thing that was happening behind the interfaces that we couldn’t see. That’s when there were lots of conversations between Philipp, Amelie, and me.
That's true. In the last phase, as we were finalizing copy, there was a lot of back and forth with Amelie and Mar. My intention was to be as accurate as possible with these concepts, so that people with a technical background wouldn’t just think we were pulling out buzzwords, like “generative neural networks” and so on. We had to be accurate, but not so technical that laymen wouldn’t be able to understand.Mar: When we launched the editorial, I felt I actually had more of a response from people working in the tech field, but people who aren’t necessarily experts in machine learning. I guess they’d heard about some of the concepts before, and so it was easier for them to follow along. And at the same time they enjoyed the visual output.
What were some of the questions that arose for you, or some of the main considerations from each side?
We envisioned an interactive UX from the start, and also debated which data science concepts were exciting and simple enough to integrate, without the need for a complicated backend server architecture. Machine learning tends to be compute-intensive and requires costly GPU server hardware for training and inference. But nowadays, there are frameworks like Tensorflow.JS that enable a limited set of machine learning operations on the client side as well. Also, mobile devices have come a long way and offer beefy GPUs.
To keep the software stack simple, we decided to avoid a machine learning backend and to do any machine learning on the client instead. This constraint ruled out natural language models, which tend to have significant sizes. Instead, we quickly settled on visualization as the core domain. Low-resolution computer-vision models are small enough for mobile deployment, and I was surprised to learn that this restriction is not a barrier to our creative process; who needs crispy high-resolution images when we can embrace the artistic nature of upscaled icons? In the end, we used an off-the-shelf DCGAN model from 2015, a type of generative adversarial network that generates images. Such a network is exciting to work with because it connects the image space, a domain that we as humans can easily understand, with the overwhelming abstraction of highly dimensional vector spaces, the domain that deep learning algorithms operate on. In addition, Claartje create some new tile-generator code on top of the network, which was fun to see. Translating her tile generator from the deep-learning framework PyTorch to Tensorflow.JS taught me a great deal, too, about converting models or doing optimisation in the browser.
Initially, my main concern was to decide on a form of visual output that deviated from traditional machine learning results. We did some research, but in the end Claartje managed to conceive of both. She’s very multidisciplinary in her approach; she works on machine learning but also likes to paint, work with p5, and make music. When you see some of the results, even though they look quite abstract, I think they have some of her identity [IMAGE TILES]. It was also interesting to see that Claartje picked up on the notion of generative tools, based on some references we’d collected [IMAGE TOOLS]. That decision defined many technical aspects of the project. One thing I took away from the project is that it’s nice to work with constraints; they help you frame and navigate what you’re doing in such an explorative project.
Do you think the work from Machine Windows has the potential to translate over into other projects?
I’m looking forward to making tools like this for our clients. I can see we have the technical skills, and our design team has the will to do such things as well. With Claartje we achieved a lot in three months, working on a tool that was super constrained by the fact that it had to work on a normal internet browser. That made me think about what we could do with a longer period of time and with new constraints. We make tools all the time, but we don’t make tools for comms. I think this would be really interesting for BB. I also think there’s an opportunity if the client has certain needs, like the need to produce a lot of visual materials—if quantity is an important factor, and not necessarily with a huge budget.
Working with Claartje, but also with Amelie, Iris, and Mar from BB, was a lot of fun, and I think we can use this project as a blueprint for more internal collaboration between the dev and design teams. Maybe we can even find an experiment that doesn’t need a crispy design, where we can embrace the unfinished nature of data science sketches.I also gained a better understanding of approaching machine learning in the browser. Nobody in our team has done that before, and it could easily be moved into a client project.
Can you elaborate on those different approaches?
Data science often starts with data exploration, so you do many things on your laptop, look at the data, and sometimes train and evaluate a model. If your computer doesn’t have enough computing power, you can rent some cloud hardware. So you don’t have to worry about the model size — it could easily be a few gigabytes — or about inference performance, like how many images can be generated per second. It's OK if it's one image every 10 seconds, because you just want to know if the model will give the desired results.For example, the stable diffusion tool takes a few seconds to generate an image from a text prompt. But it’s a different story when you execute a model in a browser as part of a fluid user interaction. There the model has to be small enough to download it quite quickly, and it has to be fast enough to keep the user engaged. The latter was a challenge because our experiments, Continuous Coordinates and Computational Compositions, require fast image-generation speed. In the beginning, this process was very slow, didn't work reliably, and needed some digging to iron out the wrinkles.
I was going to say that I liked this act of minimising things, from the data science as well as the design perspective, because it brought us to the core of what we were doing together with machine learning and design. I think that’s what’s nice about the project—beyond the visual output, we also had to minimise and transmit the process of what we were making, and it had to be very on point for that to happen.