mind from matter

Beyond #NeuroAI, we need #AI4Neuro—and we’re going to suck at it (at first). [Part 1]

2024-01-01T00:00:00+00:00

2023 came and went, and I somehow logged exactly 0 blog posts…for the first time since starting this blog in 2014. So I’ve resolved (once again) to write more and write more regularly, and as a consequence, post less polished thoughts. With that in mind, I’m going to start the year with a bang and potentially generate some good old (niche) academic controversy:

Forget about NeuroAI, let’s do more AI4Neuro{-science, -systems, -biology}.

No, of course, don’t forget about NeuroAI. But I think the latter—AI4Neuro—is quite different, and equally, if not more, important for neuroscience. When reading this, please remember that I’m not a prescriptivist or gatekeeper for who gets to do what in which field (or at least, I’m not trying to be). I’m simply making the case for calling something what it is, especially when it’s different from something else.

Maybe we will look back on 2023 and chuckle at our collective naivety (yet again), but with the increasing popularity and uptake of OpenAI’s ChatGPT throughout the year, and the introduction of Google’s Project Gemini in December, 2023 felt like the first time in a while where we can talk about artificial intelligence (AI) in earnest and without a hint of sarcasm or irony (at least, if you wanted to).

Sure, it’s not A-General-I yet and these systems and algorithms have tons of limitations surrounding their robustness, safety, ethics, lack of embodiment and intuitive physics, and they still make a lot of plain dumb mistakes. But in situations where I used to be much more comfortable referring to something as machine learning (ML), today, a system that helps me code, think through problems at a conceptual level, and generate “new” (i.e., stolen and recombined) images and sentences seems, undeniably, to be AI.

2023 also seemed to be the flagpole-in-the-ground year for “NeuroAI”—an intersection and marriage of subfields in neuroscience and ML/AI—with a kind of “declaration of independence” position paper authored by some of the biggest who’s-whos in computational neuroscience and ML, as well as the emergence of beautifully curated resources cataloguing the breakout of NeuroAI onto premier ML/AI conferences. In fact, it was Patrick’s original tweet collecting NeuroAI papers at NeurIPS that got me thinking in the first place: am I doing NeuroAI, and what is NeuroAI?

After going to NeurIPS, absorbing the dizzying array of new research (and hand grenades), and chewing on these questions over the last month during the holidays, I came to this conclusion:

I am not really doing #NeuroAI;

(for the most part, and unless you are looking to give money for neuro + AI research, in which case, I am).

Instead, I’m much more interested in doing AI for neuroscience (#AI4Neuro).

And because of this, I’m hoping for there to be a similar community for creating, curating, and contributing to a body of research that uses ML and AI to advance our understanding of neuroscience, neurobiology, and neurological disorders.

Why the kerfuffle?

“But Richard”, you might say, “NeuroAI already has Neuro and AI in the name, so it is clearly a superset of what you’re asking for, why do you want yet another field with its own name???”

Yes, I admit, it seems like a trivial and uncontroversial semantic debate. But you know me, I love pointless semantic debates, and I need to pump out CONTENT.

No but seriously: I think that while NeuroAI has a research agenda that is by nature very much related and complementary to AI4Neuro, at its core, it has a very different priority and asks fundamentally different questions. That being the case, I think clearly delineating the conceptual differences between the two would be beneficial, otherwise we run the risk of doing one as a “side hobby” while primarily pursuing the other. In fact, I think we have already been doing this, under a different name. And since science is a sociological construct and that we’re in the age of hashtags, #AI4Neuro needs its own branding goddamnit!

So in this multi-part blog post series, I’m going to explain what NeuroAI—to the best of my understanding, and with references, of course—is and is not, and why AI4Neuro is a separate research direction that should come with a set of different and clearly defined priority, philosophy, and methodologies.

Following that, I will talk about why I think we currently, and will continue to, suck at AI4Neuro for a little while, at least compared to, e.g., AI4Physics, and hence precisely the reason why it requires collective effort of all the brilliant minds out there to think about some very different problems. The spoiler for the final takeaway is that, while I think developing and applying ML/AI tools in neuroscience will undoubtedly advance the field, it’s rather the active exercise of thinking about how neuroscience research can be done in a way that’s more amenable to using AI tools that may potentially push it out of the current, “pre-Copernican” ages.

But to conclude this first introductory post, I will set up the punchline with a detour, to give you some context for why I was thinking about (i.e., triggered by) this nomenclature thing between NeuroAI and AI4Neuro. It takes us back to my early grad school years, circa 2014-2017, and the confusion between “computational neuroscience” vs. “computational neuroscience”.

“Computational Neuroscience”

For the most part, my PhD was about developing and applying computational methods for problems in neuroscience, such as signal processing for analyzing neural time series, or statistics and “data science” for multi-modal brain data, including text and images. So I thought it wasn’t unreasonable to believe that my research was in “computational neuroscience”, and that I should talk to other “computational neuroscientists” at conferences like “COmputational” and SYstems NEeuroscience.

This was, for the most part, a disaster: for some reason, it just so happened that the computational neuroscientists from computational neuroscience graduate programs that I was talking to at these conferences were by and large doing something entirely different and had little interest in talking to me. It wasn’t that people were mean or antisocial, and maybe I was communicating my work poorly, but it simply felt like the goals and problems of my work was not something they were familiar with.

But how could that be??? We were all, after all, computational neuroscientists, right? Wrong. As the years went on, I realized that a big part of computational neuroscience—arguably its dominant view—isn’t really about using computational methods per se, but that it is first and foremost a specific perspective of neuroscience, namely, that the brain is a computational device. So really, a majority of computational neuroscience research was about “the computational brain”, and how neural circuits implement computational algorithms, similar to how logic circuits in physical computers and the embedded software can implement the representation, transformation, and transmission of information (Narrator: “we will come back to this.”).

In other words, most of modern computational neuroscience was actually theoretical neuroscience working under a very specific perspective or assumption (one might even call ideology), and confusingly, it just happen to also employ computers and computational methods. Many to most even developed and applied new computational methods for analyzing data, but it was, at the end of the day, for extracting insights about computations in the brain. But the people that were “just” developing computational methods for neuroscience and neurobiology? They were (and I was) like these weird niche people doing data analysis for experimentalists, on whom we depended for our existence and livelihood (not untrue).

At this point, especially if you are a computational neuroscientist, you might either disagree vehemently or simply shrug:

You might disagree with the premise that computational neuroscience is narrowly defined as being about neural computations. After all, the Wikipedia article on Computational Neuroscience says this (which feels very much like it was edited and approved by an Eastern European researcher over 50.):

“Computational neuroscience focuses on the description of biologically plausible neurons (and neural systems) and their physiology and dynamics, and it is therefore not directly concerned with biologically unrealistic models used in connectionism, control theory, cybernetics, quantitative psychology, machine learning, artificial neural networks, artificial intelligence and computational learning theory;”

If this resonates with you, I’m afraid you are one of these weird people, like me, that is into biology but can’t deal with the mess in a wetlab. You probably also go to the other Computational Neuroscience conference. But if you actually believe that modern computational neuroscience is not concerned with machine learning and artificial neural networks, well, I don’t know what blissful hole you live under, but you should come out and play with the rest of us.

On the other hand, you might shrug and say, “well, what’s the alternative to studying neural computation?”

Sure, maybe that’s what the brain does at the end of the day, but that’s not the point. The point is that “computational biology” and “computational physics”, as fields, are not primarily interested in the computational properties of biological and physical systems, but why is computational neuroscience? This will be an important point for the final part of the series, when we look at how AI4Physics and AI4StructuralBiology are done differently.

Nobody ever told me I was weird or unimportant, but I have talked to people like me who have similarly felt that they were just data analysts that work with neuroscience data. There’s nothing wrong with that, and those data analysis skills turned out to be hot commodity when “data science” rolled around, with the kinds of time series, image, and tabular data that companies were dying for somebody to extract gold from.

But beyond that, I, like many of the others, do want to understand and contribute to a theoretical understanding of “how the brain works”. It’s just that I thought it was more important to see the brain as a biological and electrochemical system than as a computer. I wanted a “computational neuroscience” that was dedicated to computational tools for neurobiology, including models, statistical methods, machine learning, and now, AI.

And I still do, which is the reason why we are here today. At this point, you might be wondering “why the hell did I just read this?” Well, just replace computational neuroscience with NeuroAI, and you can basically skip the next parts of the series, where I want to make the case that we should avoid the same confusion with neuro and AI as we did with computational neuroscience. Not only because of a petty semantic disagreement, but because it will influence our understanding of “how the brain works”.

COSYNE22: Reporting on Main Conference and Timescales Workshop, and Some Reflections.

2022-03-28T00:00:00+00:00

Uhh so this whole thing got a bit long, and I thought about breaking it up into parts, but I just can’t be bothered to make two different posts and tweet it multiple times, so just use the table of contents above to jump to what’s interesting/relevant for you. The 3 sections are more or less standalone. Either that, or get a refill of whatever you’re drinking and buckle up.

At the risk of trying to replicate my own success (but also to revive this blog), I will attempt to summarize some of my own takeaways after attending Cosyne 2022. You should by no means read this as an “objective” survey of the topics and trends at the conference, but rather views through a very filtered lens, partly because I just didn’t go to as many of the talks (lol), so the coverage will not be anywhere close to being complete (is that even theoretically possible?). This is not just a product of me skipping talks, though. I’ll try to convince you later that this was (at least in part) by design. But also, in the 3 years since Cosyne 2019, a lot of things have changed. While the scientific community is still struggling to claw back towards some kind of in-person normalcy, the upside is that most of the conference material exists online in one form or another, even the poster. More importantly, they exist in a systematically-curated and accessible form, in no small part thanks to efforts from the organizers, but also to community initiatives (like World Wide Neuro) and crazy individual efforts, so I don’t feel so compelled to do that.

The biggest difference between 2019 and 2022, though, is that I’m 3 years older, having existed as a fly on the Cosyne auditorium wall for 3 years longer, and somehow finding myself to be partially integrated into this community. Actually, the most interesting thing I took away this year was in fact this particular piece of “meta-perspective”, and it manifests in several concrete ways:

First, my view of the science that was presented at the conference is now less of a point observation, but one of a noisy trajectory or gradient. This shows up implicitly in the reviewer biases in which abstracts were accepted (in terms of what’s “in” and “out”), which is also something the organizers needed to be aware of in constructing the conference program, which they briefly talked about in the opening remarks. The exercise of revisiting and writing up my notes really helped me consolidate what I took away from the main conference, and I will raise the disclaimer again that this in no way represents a “true” reflection of the conference themes, but my filtered version, so check out a few of my personal favorite themes in Section I.

Second, after 7 years of going to conferences, it finally occurred to me in Lisbon that I don’t (explicitly) know how to conference. There are a lot of conventional wisdoms floating around about how to do conferences, and sometimes specific instructions for specific conferences, most of which about how to prioritize watching talks vs. going to posters vs. socializing. A lot of that is helpful, but I think it dawned on me that nobody really ever gave me systematic instructions on how to think about this, and that probably most people didn’t get this as a part of how-to-science manual either. In Section II, I will continue to not give instructions, because there is no one-size-fit-all advice: actually it’s different for different people, at different conferences, and most importantly, at different stages of career. At the same time, it seems to be a shared experience that people often have the feeling that conferences are overwhelming, exhausting, and guilt-inducing. I will write a bit about this realization, and make some meta-suggestions for how to conference—in particular, setting goals that are appropriate for your interests, career stage, and personality.

Finally, beyond passively participating in the conference program, I had the good fortune to actively shape a small part of it by co-organizing a workshop with Roxana Zeraati on neuronal timescales, along with a line-up of fantastic speakers (and cool people). This was obviously a stressful and intense experience, especially on the day of, but it was also exhilarating and insanely productive scientifically. Not only that, I felt like I was able to connect with the people there—as human beings—in a much deeper way than I was able to while just chatting at a poster, or casually meeting up as a part of socializing at the conference (especially being a relatively introverted person). In Section III, I will give a (not so) brief report of the discussions we had on “Mechanisms, functions, and methods for diversity of neuronal and network timescales”, as well as some reflections about my experience as a workshop participant and co-organizer.

Section I. Highlights from Main Conference

I took notes for some talks and posters. They’re not very good notes. To be honest I’m not even 100% confident I left with the correct takeaways, I did go back to check some parts of the talks to consolidate when possible, but you know what they say: the best way to learn the correct thing on the internet is to write it wrong (so please feel free to correct me if I misrepresented something).

Cosyne is no SfN, but the volume and breadth of cool works is still incredibly high—bordering overwhelming—and everything looks interesting and at least tangentially relevant to the broader theme of neural dynamics and computations (at Cosyne? wow dude who would’ve thought), so it’s quite hard to pick which talks and posters to not go to. In the end, I basically give up and attended them when there was nothing else I wanted to do more. Afterwards, I looked through what interested me the most based on which of the talks I happened to be at that inspired the most thoughts, and they mostly fell into the following (unsurprising) categories (though I had to do some shoe-horning for some).

(Links typically point to recordings of that talk or the online version of the poster, when I could find them, otherwise poster numbers are in brackets)

I love “weird” stuff

Hands down, what I enjoyed the most are talks that are quite different from the “classical” Cosyne stuff, and I’m very grateful that the organizers decided to include a broader set of topics for the talks. Nothing against PCAs & ANNs, it’s just that the bandwidth (i.e., new information per talk) is much higher for talks on weird stuff I’ve never thought about before. I guess I enjoy the feeling of hearing about ideas that could fundamentally change how I think about something, or just ideas that are so completely unfamiliar to me that it triggers a novelty reward. Of these, I want to highlight 2 talks and a poster:

Asya Rolls talked about the similarities and differences between the nervous system and the immune system, how they both make “memories”, and how memories in these two systems interact. She showed results from some super interesting experiments, and her talk is online, I highly recommend checking it out because it’s quite accessible as an outsider to immunology. The insight in a nutshell is that these two systems face similar environmental demands, in that they have to adapt to novel situations, especially situations where remembering how to optimally act might save your life the next time around (think a tiger vs. a novel pathogen). Like a lot of people, I got my Immunology degree from Twitter after COVID vaccines dropped, but never did I think about how the brain might be involved in the immune response. Among their crazy results: chemogenetic activation (via DREADD) of dopaminergic neurons in the VTA leads to a more potent lymphocyte response, which (I think shown in a separate experiment) can trigger a pathway through the bone marrow (?!?), resulting in proliferation of cells that can kill tumor cells more effectively after VTA activation. In plain English: when your “reward system” is activated in coincidence with an immune response to a foreign pathogen, the immune response to the same intruder is stronger the next time. On the flip side, you could have an allergic reaction (which is a “rogue” immune response) upon holding fake flowers if you have an allergy to real flowers, just because your brain recognizes it. A quite relevant example is “phantom COVID symptoms”, where knowing somebody who you were in contact with that tests positive will immediately make your throat tingle (though no causal claims are made with respect to this)—this literally happens to me every other week. Lastly, she showed that this neuro-immune link is quite specific, where ensembles of neurons in the insular cortex (or, insula) that were active during initial infection can trigger a similar immune response when artificially activated, but not via non-specific activation of neurons across the insula. This is literally the insular analog of memory engrams in the hippocampus and amygdala (see Josselyn 2020, shoutout to the 6ix). This really brought home the message for me that we should rethink what “placebo” means, because anything that the brain “sees”—or “thinks it sees”—could cause a very real bodily response, and her works explore the extent to which this is true.
Another talk I really enjoyed was Susanne Schreiber’s: it’s not really “weird” at all in the context of neuroscience, just mind-expanding in how she bridges between fundamental biophysics to large-scale network dynamics. In a lecture about single-neuron dynamics, you’d typically learn that neurons are classified as Type-I or Type-II depending on how their output firing rate scales with input, and, importantly, that this is a static characteristic of a neuron. Roughly speaking, Type-I neurons have a continuous transition from no-firing to firing, hence (theoretically) being able to fire at arbitrarily low firing rates given a weak enough suprathreshold stimulus. Type-II neurons, on the other hand, have a discontinuous jump: at some point when the stimulus gets strong enough, they go from not firing at all to firing at a much higher rate. Apparently, there’s a third type of firing pattern called homoclinic action potentials, which is a state-dependent mixture of Type-I and Type-II, and what Susanne talked about is that neurons can switch from being Type-I or Type-II to homoclinic due to a variety of factors, and this seemingly small single-neuron change can have big consequences for network synchrony. In particular, temperature and pH—two things that aren’t typically included in models neuronal dynamics—can trigger such a change (in addition to extracellular concentration of some ions), and the resulting network (hyper)synchrony can resemble pathological states like epileptic oscillations. To me, this should be a textbook example of how biological and physiological mechanisms can impact brain function through altered neural dynamics, and you can check out the talk from the link above.
To wrap up this section on weird stuff, Chaitanya Chintaluri gave a poster presentation on an idea so wild that I think it’ll either be a Science paper or forever-bioRxiv (no disrespect at all, and I told him this on the spot). If you’ve ever worked with in-vitro cultures, or are familiar with fetal neurodevelopment, you’ll likely have encountered the fact that networks entirely isolated from inputs often still have (quite large) network activations, i.e., network bursts or “avalanches”. It’s a strange and very robust phenomenon, and you see this anywhere from primary cultures, stem cell-derived planar or spherical networks, as well as slices. On one hand, it’s not that crazy to imagine that a couple of neurons might spontaneously fire, and when the connectivity is right, the small handful would trigger a large network response until exhaustion, and the cycle repeats itself after some neuronal or synaptic adaptation time. On the other hand, you can ask a simple yet perplexing question: why do these neurons want to fire, especially considering the fact that action potentials are quite energetically costly? Chai’s proposal was that getting rid of energy, specifically in the form of excess ATP, is precisely the reason why neurons want to fire, because accumulation of ATP in the neuronal mitochondrial is apparently toxic. There are a lot more details in the biology that he’s worked out, and heavy-duty stuff like free radicals, Reactive Oxygen Species, etc. etc., and they built a computational model to reproduce this effect. I have no idea if this will turn out to be true, but it goes along with my pet conspiracy theory that action potentials are not for communicating with other neurons, but rather stems from a neuron’s intrinsic and uncontrollable urge to seek relief from some kind of electrochemical pressure state.

Neural Manifolds Plus (TM)

What’s a Cosyne blog post without some neural manifolds? Dimensionality reduction remains the workhorse of computational neuroscience, not only in neural data, but now also in high-dimensional behavioral tracking data. I think it’s fascinating that there is now essentially an entire subfield that not only uses dimensionality reduction methods, but works towards building a theoretical framework to explain these results. It feels a bit circular if you assume PCA came first, but if I look back beyond the last 5-10 years, it’s clear that latent variable models of neural population dynamics, as a theoretical construct, have always been a main topic for the community of people at Cosyne. It’s just that it used to be the case that both the development and application of these methods—I’m thinking GPFA, (Poisson) LDS, etc.—were reserved for the specialists and the computational collaborators, but now many more people at least have the opportunity to apply PCA or UMAP out the box from Anaconda. In any case, it’s basically impossible to cover all the works on this topic in depth without a dedicated blog post (like Patrick Mineault’s), so I simply have a couple of examples that I thought were interesting, either on the method front, or using it in some way to append existing theoretical understanding of computations in the brain.

One of the emerging themes in the realm of dimensionality reduction is to use models that combine continuous latent dynamics and discrete state transitions to learn a hierarchical representation of neural data, from which the discrete states can then be mapped to annotated or learned behavioral states/motifs. You’ll see this throughout this blog post. A classical example is a bunch of (local) linear dynamical systems (LDS) with between-state transitions defined by a (global) Hidden Markov Model (HMM). Adi Nair talked about a similar approach of using recurrent switching LDS, and I always find it interesting when these kinds of methods are applied to weird subcortical areas, in his case, the hypothalamus. I know close to nothing about the hypothalamus, but I guess the traditional view is that specific ensembles of (ventromedial) hypothalamic cells encode for specific and non-overlapping sets of actions related to aggression and mating (like sniffing, mounting, etc.). This wasn’t really the case in their data at the single-cell level, so the next best hypothesis is that cells with mixed selectivity coordinate to form a population code (like a humpy/bitey PFC), and they use a RSLDS to model the neural dynamics within and across discrete states. He showed, surprisingly, that there is a mixture of hand-annotated actions in all the switching states, so they don’t have representations at the level of discrete actions, but some contextual state. Or mood, if you could call it that in a rodent. And because it’s a bunch of LDSs, you can do linear system analysis, and he showed that each of the latent linear dynamical state spaces (i.e., the linear dynamics matrix) has a long-timescale(!) integration dimension, and that the proportion of aggressive actions scaled with the time constant of that integration dimension, like some kind of rage meter.
The hippocampus is another system for which there is a prevailing theory (or dogma, depending on who you ask): surely everyone knows about the Nobel Prize-winning discovery of place cells by now, and it’s known that when the animal is placed in a different environment (same physical arena but with different sensory cues), those place cells still have place fields in the new environment, just at different locations compared to the old one. The prevailing theory of “remapping” proposes that those place cells essentially get activated by different sensory inputs in the two different environments, hence the “places” they represent are different. In plain English, it’s like each place cell in the hippocampus is an index card (like in the library), and when you take the stack from one library to another, you can rewrite the code on each individual card—independently—to point to a different section. André Fenton proposes quite a different and fascinating hypothesis based on neural manifolds in the hippocampus. If I understand correctly, his theory of “reregistration” goes like this: instead of place cells with independent degrees of freedom in their place representation, most place cells actually participate in the same population code that’s stable across different environments, which means that their co-firing (or correlation) structure are important and preserved to form this theoretical neural manifold. In some sense, this population dynamic just happily trucks along the on-manifold trajectory, as it does, no matter what environment the animal is in. How the remapping or re-regrestration comes in is through a small handful of anti-cofiring cells, which transforms or “projects” (in a loose sense) the stable population code differently in different environment. Extending the index card analogy, it’s as if most index cards have just a single and permanently printed code, while a few of the rest make up the codex that tells you how to interpret the same code in different libraries, e.g. under one filing system (say, in Canada), the number on the index card takes you to the non-fiction section, while under a different filing system, the same code takes you to the comic book section. He showed tons of experiments and analyses, and I’m not confident I’ve completely digested the link between the conceptual level and the specific analyses, but he uses Isomap—an old school nonlinear dimensionality reduction technique—to form these manifold representations, and I wonder how many of our neuroscientific conclusions would be changed if different (and more complex) dimensionality reduction techniques are used in practice.
Another interesting thread of investigation is, of course, the mechanisms behind the consistently observed low-dimensional neural manifolds. Renate Krause looks at this in vanilla recurrent neural network models trained to perform some simple classical tasks, which are good model organisms because they exhibit two puzzling characteristics: first, the trained neural dynamics are (linearly) low-dimensional, often characterized by the small number of principal components required to explain most of the variance in population rate; second, the trained network weights are usually much higher dimensional, measured both by the number of PCs required to capture total variance as well as impact on task performance when removed, which means that there are many directions that the weight matrix can project the activity onto. So the question is, how are the dynamics kept to be low-dimensional while the weight matrix can push it around in many directions? To bring harmony to these two contradicting observations, she defines a new concept of “operative dimension”. I think the gist of it is that, instead of considering the global weight matrix, she takes points along the low-dimensional neural dynamics, and finds directions in the weight matrix that will produce the largest local change in activity if removed. Through this procedure, she finds that it is possible to extract a low-dimensional connectivity that shapes the low-dimensional dynamics without performance loss, you just have to look in the right way. This is quite complementary to the strand of work that starts from low-dimensional connectivity via, for example, low-rank constraints, and it’d be really interesting to see if they somehow converge onto the same learned connectivity or operative dimensions.

Unstructured behavior

You really can’t understand the brain without trying to understand the behavioral demands animals face, so I’m pretty happy that Cosyne continues to move in a direction that includes more behavioral work, especially unstructured behavior. Similar to what I said about neural data above, you can learn some pretty interesting continuous and discrete dynamics when looking at the behavioral data alone, and it’s even more cool when they are linked to low-dimensional representations in neural data. In many of these algorithms, the heavy-lifting is done by nonlinear transformations and embeddings, i.e., deep neural network, instead of PCA. You might sacrifice some interpretability, but the nice thing about behavioral data is that you can always look at the learned sequences or motifs and make sense of it just by looking at or listening to the data.

Bob Datta started off the Cosyne talks with an absolutely stunning keynote, centered on a tool his lab developed, MoSeq. I don’t even want to write much about it, because I’d just do the talk a disservice. And really, you basically don’t need any prior knowledge in anything to understand and enjoy most of the talk. But in a nutshell, MoSeq is a method that models time series data, whatever modality it comes from, and when applied to videos of free behavior, it automates the segmentation of behavioral “syllables”, or short snippets of stereotyped actions. This is in contrast to the hand-labeling approach of having some poor grad students sit in front of the computer all day annotating when the animal is doing what, on a frame-by-frame level. Besides bringing sweet, sweet relief to the hand labeler, the unsupervised algorithmic approach does it with perhaps a little less labeler bias, deciding on motifs purely based on statistics. That is not to say that the algorithm is free of bias and human intervention, since you presumably still have to pick hyperparameters like number of syllables to extract (or do so via cross-validation). But once you have meaningful syllables or motifs, you can do a lot of interesting things with them, such as characterizing individual motif statistics like duration, as well as the transition structure between motifs that make up longer sequences of behavior. You can also ask how interventions like giving mice drugs change their motifs and transitions, as well as their neural correlates. Bob talks about all the above, the talk is fucking awesome, and I can’t imagine how many grad student-hours went into making these snazzy figures. In the end, he speculated about how these syllables may be “atoms” of behavior, and I think it’s a fascinating thought that deserves careful consideration. Obviously, the algorithm decides the form and duration of these syllables, and statistically, it’s always a trade-off between generalization vs. precision, i.e., how different does a snippet have to be from the rest to be considered an entirely different syllable? Furthermore, you can always split an atom, and you can construct things from atoms, and when animal behavior is involved, I definitely wouldn’t bet my money against the possibility that individual variations in syllables are context-dependent. In other words, maybe there is an average canonical “sniff”, but how individual sniffs differ may be a function of what the animal was doing right before, or the larger context it’s in, not just random noise. Anyway, highly recommend just watching the talk, and I’m really stoked about this sort of stuff, it’s right in the mix with embodied cognition, and would be especially great if we can move it into the wild.
Unaffiliated but very much related, Yarden Cohen deals with the same high-level problem of segmenting continuous time series into meaningful “syllables”, except the syllables here are of a totally different kind. Yarden studies songbirds like canaries, which can apparently produce extremely long and beautiful songs (check the video for a very entertaining example). Similar to behavioral videos, you can ask expert annotators to look at a song spectrogram and segment motifs by hand, or use machine learning approaches to do it automagically—enter TweetyNet for birdsong segmentation and classification. From well-segmented syllables, one can ask analogous questions of how syllables transition from one to the next to make up songs, as well as their neural correlates. All my questions about behavioral motifs stand here, though you might imagine birdsong syllables to have less (meaningful) variability from instance to instance?
Wherever brain and cognition are involved, you could potentially set up a contrast about discrete symbolic computation vs. continuous dynamical systems. Not that this view was (implicitly or explicitly) championed by any of these speakers, and they all eventually have a mixture of both perspectives, but Heike Stein gave a really interesting talk that delved more into the discrete motif (or state) and explicitly described the dynamical system. Instead of condensing high-dimensional data to a priori unknown low-dimensional embeddings or syllables via unsupervised learning directly, she first uses DeepLabCut to just get the motion trajectories of the 4 paws of mice running on a little laddered wheel throughout the course of a motor learning task. The mice are (quite literally) stumbling over themselves, one foot over another, when they first start trying to run on the wheel. Over time, they somehow learn the phase offset relationship between one paw and another that’s required for a stable gallop. She models the paws with coupled oscillators that can oscillate at 3 different speeds, where stable movement is achieved essentially when the oscillators are locked to the same speed (but with a half-period offset). The different frequencies and coupling are instantiated by—you guessed it—a Hidden Markov model that transitions between different (discrete) states, where the synchrony of the paws are different between these states, some of which are desireable for smooth running on the wheel. In other words, it’s a switching rotational dynamical system. You can start to see the high-level similarities between the methods employed by these very different subfields of computational neuroscience. Anyway, I’m always a fan when somebody explicitly writes down a set of dynamical equations, and even more so when it can start to capture something as complex as animal behavior with coupled oscillators.

Bringing biological realism into models of computation

I don’t know if this is going to be a “trend” at Cosyne, but I think injecting biological realism into models of neural computation should be a hot topic moving forward, even if challenging. We have lots of great models of neural computation in deep neural networks (DNN), but as great as they are in reproducing performance on visual and other tasks (recent Twitter debates notwithstanding), they don’t possess much biological realism aside from the rough preservation of anatomical hierarchy (e.g., through the ventral stream)…and the “neural network” in their name. Yes, I know there was a whole BrainScore workshop on representational similarities between DNNs and the brain, but that’s more about outcomes of models, not really introducing more structural elements into the model a priori, or biological inductive biases, if you want to call it that. And that’s not to say those works are not interesting or valuable, it’s certainly puzzling why some architectures can produce real neuron-like activity. But coming back to my point here, I personally find it important to also ask how including biological realism (e.g., local connectivity constraints, cell-type specialization, excitation and inhibition, etc.) can not only alter task performance, but more so, how these features unlock and/or constrain the types of computations a network can perform and how.

One of the most obvious biological details missing from the currently popular models of brain computation is action potentials. We don’t have to go down the spike vs. rate rabbit hole here, that discussion needs a book (or periodic Twitter debates). But assuming you care about spikes to start with (your brain obviously does), I think it’s interesting to ask how spiking networks can implement learnable computations like rate-based DNNs do. That’s why I was pretty excited that Dan Goodman gave this year’s tutorial on spiking neural networks, because these things certainly influence what people—especially junior researchers—then remember and find interesting moving forward, consciously and subconsciously. I’m not saying a linear integrate and fire neuron has all the biological details we need, I just think it’s a lot closer to how a real neuron works than a dot product and sigmoid. In the tutorial, Dan goes through some basic spiking neuron models, as well as some fancier stuff you can add like adaptive spike thresholds. I work with Brian2 regularly, so I’m familiar with SNNs as mechanistic simulations, but it was very cool to learn about surrogate gradient descent (quite clever) and training spiking networks in pytorch to do tasks, especially recasting the same coincidence detection problem to be solved by delays vs. weight matrices. All the material is online, and you can follow along with his recording, highly recommend.
Adaptation (e.g., spike frequency adaptation) is another effect largely missing from DNN models of the brain, and I saw a really cool poster from Victor Geadah (2-41) that incorporates this into rate-based recurrent neural networks. Basically, instead of a fixed activation function for the neurons in the network, they have a flexible activation function from a family of parameterized functions, and those adaptation parameters are controlled by an internal RNN. It’s a bit of a yo-dawg situation, because each unit in the big RNN has a small RNN inside that controls its adaptation, but the detail doesn’t really matter here, the point is that now every unit in the network has a flexible and dynamic activation function that changes over time as a function of the input, as well as the cell’s own state and that of the network as a whole. The cool part is that even though the whole thing is trained end-to-end to maximize performance on some sequential tasks, the network ends up mimicking adaptation seen in experimentally measured neurons. I think he also compared to the condition where each unit’s activation is learned, but fixed over time, and finds that the dynamic adaptation is better. I found a preprint online, but I think the poster had more. I think this could potentially have a really nice link to gated RNNs in machine learning (e.g., LSTMs and GRUs), and can unlock questions about potential mechanisms of controllable adaptation, both internal and external to the neuron (like neuromodulation).
Another example of bringing realism was Mala Murthy’s talk (speaking on behalf of Ben Cowley), where she really took biological correspondence to the max when comparing DNNs to a circuit in the fly brain. The talk is unfortunately not online, so I know I will butcher it since I can’t revisit it, and I would’ve really liked to. But the gist of it is that, since the fly brain is quite manageable in size and already well-characterized, where many neurons are even individually named like in C. elegans, you could actually build a DNN model with one-to-one correspondence for some of the neurons. You can imagine churning through the whole DNN-is-brain pipeline here, training the network on task alone (where the input is what the fly sees and output is its task-related behavior, I believe) and compare activation in the matching real vs. artificial neurons, and that would already be cool. But they take it to a whole different level: because those neurons are individually identifiable, you can do single neuron knock-out via optogenetic(?) inactivation, and see how shutting off each neuron impacts the fly’s behavior. Then, you can mimic that whole procedure in the artificial network by simply ignoring the activation of those corresponding neurons while asking it to match the behavior of the minus-1-neuron fly. I think this approach is super cool, and definitely starts to move beyond simple correspondence towards causality. If anything, it gives us more confidence that the derived DNN is a more faithful model of the real thing we’re trying to reproduce.

Humans, oscillations, and mechanistic modeling

I complain about the lack of representation on oscillations and humans at Cosyne all the time, so it’s only right to give credit where credit is due. This year, I was genuinely surprised about the number of presentations on human electrophysiology and fMRI, as well as mechanistic modeling without task constraints, and there was even a whole session of talks on neural oscillations!!! Every time I passed by a poster that had something to do with one of these areas, I did just the tiniest fist pump-another win for the team. I ended up noting down quite a few interesting ones, but I don’t want to describe them all in detail, so here’s a lightning round to close up the scientific programme:

Michael Long gave a fantastic invited talk on dissecting the neural substrate of human language understanding using invasive electrophysiology, i.e., electrocorticography (ECoG). As far as the intersection between human cognitive neuroscience and computational neuroscience goes, this is as good as it gets. Through some clever tasks and free conversation, he shows functional specificity of different brain regions for different phases of communication (presumably based on their high gamma response): auditory areas for perception, motor areas for speech production, and—what he focuses on—mid-frontal regions for “planning”. These regions are not for generic planning involving verbal instructions or sound production, but really specifically for language understanding. He even showed some causal perturbation data where electrically stimulating these planning areas results in some really interesting errors in the participants’ response—and this is just half of the talk. The other half looks at data from some really cute singing mice, also with causal manipulations to boot. Really nice talk, and probably one of the few times you will see linguistic theory in a Cosyne talk. As a side note, the task he uses would definitely be a good one to deploy on RNN-based language models, like the ones Vy and Shailee presented on in our workshop (if you end up reading that far).
Anna Shpektor talked about hierarchical representation of sequences in the human entorhinal cortex, with fMRI data (gasp of horror). Sharing the Nobel Prize with the aforementioned place cells in the hippocampus, the entorhinal cortex (EC) has grid cells, which are thought to be the basis functions of the downstream dirac delta-like place representations. There are two prominent observations about grid fields (is that what they’re called? Like place fields?): first, they can emerge in sequential tasks that don’t necessarily have to do with space, thus leading to the interpretation that they are important for representing generic sequential structures in some platonic concept space; second, grid cells are anatomically hierarchical, meaning that a grid cell’s grid field resolution depends on that cell’s position along the dorsal-ventral axis of the EC. Anna asks if the union of these two observations is also true: do abstract sequences that have some temporal hierarchy, e.g., from letters to words to sentences, also have representations in the EC that fall along the front-back axis (anterior-posterior for the weird upright human)? Long story short, she finds this to be the case, and presents great data in a super engaging talk. But as beautiful as her science was, the ending of her talk was one of these things that I will always remember, even when I’ve forgotten all about the results: as a Russian-born scientist, she spoke out against the war in Ukraine, and used this opportunity to highlight some organizations that were helping refugees from, as well as people still, in Ukraine. Here’s a Google Doc compiling some of the main ones, and you can check the video of her talk for some more. I don’t want to dwell on this too much because her science deserves recognition on its own, but seeing this live—even though it reminds us of the real and ongoing tragedy—somehow brought a weird sense of normalcy (and only then did I notice the yellow and blue stage lights throughout the talks), perhaps because we still acknowledged and cared about the world outside of the bubble.
Okay, lightning-round for real now: on the experimental side, Jacob Ratliff talked about the role of somatostatin+/nNOS+ cortical inhibitory neurons on network synchrony. It’s always interesting to learn more about cortex-wide synchrony (and the associated low-frequency oscillations) during low-arousal states of the animal, and Jacob presented some cool data on the causal role of these genetically identifiable and long-range inhibitory neurons. Optogenetically activating them while the animal is in an alert state (with desynchronized network activity) will synchronize the pyramidal neurons (at 4-6Hz) and put the animal into a quiet state. I think that’s quite remarkable, I didn’t expect such a small and niche cortical population to have such a global effect, which I tend to think is a role reserved for subcortical nuclei. Ana Clara Broggini (3-32) really tested the idea of neuronal resonance by entraining V1 neurons with optogenetic stimulation and seeing their input-output function, as well as how downstream V2 neurons respond. Surprisingly, she finds that V1 neurons don’t have gamma range narrow bandpass filtering properties (as previously reported) when input is directly injected via opto, and that entrainment is most effective at lower frequencies. This seems consistent with the fact that global synchronized states usually exhibit low frequency oscillation, and also calls into question the role of gamma in interareal communication (i.e., communication through coherence). Joseph Rudoler (1-52) and Mila Halgren (1-118) both had posters looking at the 1/f exponent in power spectra of human intracranial electrophysiology recordings, and I’m very much here for this (and secretly hope Cosyne will one day be taken over by weird 1/f stuff).
On the modeling side, Shiva Lindi talked about some heavy duty mechanistic modeling work on how the corticostriatal system generates beta oscillations. Tour-de-force exploration of both rate and spiking neural networks, and as per the norm with this type of work, lots of model parameter research is required to produce simulations that match experimental data (would be mighty nice if somebody could build a machine learning tool to assist with this kind of model discovery…). Natalie Schieferstein (1-120) uses spiking neural network models and mean-field approximations to study ripple oscillations in CA1, proposing a novel inhibition-driven mechanism that can explain some peculiar observations behind sharp-wave driven ripple frequency drifts—neat because she uses the empirically observed phenomenon as a criterion for model selection. Julia Wang (3-45) had a poster on using variational autoencoders to detect low-dimensional and interpretable latent states from LFP and EMG during sleep, which nicely correspond to awake, slow-wave sleep, and REM sleep. Really nice approach and something I will definitely draw inspiration from. Lia Papadopoulos talked about her work on modeling the effect of arousal on the (metastable) dynamics of clustered networks. If you imagine separate neural population attractor states to be encoding different stimuli (like different sounds), then perceiving these stimuli accurately over time requires the network to quickly switch between these “metastable” states. Lia proposes a beautifully intuitive explanation for how arousal essentially lowers the energetic barrier between these states, such that optimal (de)coding is achieved at moderate arousal, when the network is at a balance between coding flexibility and fidelity. Finally, Merav Stern, also working with Luca Mazzucato, talked about how heterogeneity in neuronal timescales can arise from clustered networks. As a perfect segue into the next section of this blog post, she describes a potential mechanism that could explain the often-observed distribution of timescales across cortical circuits, which is simply to have a network with a distribution of cluster sizes. Really cool work, and since I just found it for myself, here’s the preprint.

Section II. Meta Thoughts on Conferencing

This was my third Cosyne, and the first in my full year now as a postdoc. You’d think I know what’s going on by now, and in some sense, I do: show up, get overwhelmed by the firehose of scientific content (especially the midnight posters man, damn), try to go out and socialize as much as possible, try to wake up before mid-day, rinse and repeat. Like I mentioned, there are lots of conventional wisdoms that get passed down about how to do conferences, ranging from “don’t go to the talks, posters are the only place where you get to engage” to “conferences are about drinking and meeting people”. Most of these wisdoms are conditionally true, meaning that they are true if you, the attendee, satisfy some criteria. These criteria vary depending on your career stage and goals, your personal interest, your personality, and more. Imagine telling a first year PhD student without a concrete project that “conferences are for networking”, that would be a bit bizarre—even if you made lots of good contacts, what would you do with them? Even though I would certainly not do this now, I had some great times as a PhD student going to literally every event on the program at SfN and Cosyne and then just going to bed at a reasonable time, but that would certainly not suit my career stage now (and also I just don’t enjoy it).

Maybe I’m the only idiot with this problem, and it boils down to not knowing myself well enough, but I feel like whenever there is a schedule and a mob involved, I just default to following the schedule and the mob, and the thing about conferences is that the schedule runs from 8am to 8pm (or midnight…), and some subset of the mob is drinking or doing other fun stuff from noon to 4am. It’s literally impossible to do it all, and even if you somehow managed, your mental and physical health are probably rapidly deteriorating. I know a handful of people who somehow have the energy to drink till 4am and show up to SfN morning posters at 8am, and I tip my hat to them, but I’m not that guy. I actually used to get sick after every conference trip, because the immune system probably quit by the time I get to the airport, after 5 days of intense mental and physical exhaustion. But if I didn’t do it all, I always felt bad about not taking full advantage of this scientific opportunity, and simultaneously experiencing fomo while everybody skipped the afternoon and explored the conference city. It’s lose-lose-lose. So yeah, in some sense, I’ve been around long enough to get what’s going on now, and I survived the years of passively getting pulled into various currents and then feeling bad that I didn’t get pulled into a different current, I just don’t really enjoy it anymore.

If you are one of these people that know exactly what you want to get out of a conference and don’t care about anything else on the schedule or who’s doing what, then stop reading now, because it’s going to be obvious. And just to be clear, I don’t think I’m offering any groundbreaking insight here, just hindsight common sense. In fact, I’m not even offering much insight about what to do at a conference, all I’m saying is that, if you’re some kind of passive completionist like me, like one of these people that would go through every branch in those choose-your-own-adventure books, this entirely overwhelming endeavour could be a much better experience when you approach it from a different perspective, and the key is realizing that, like many aspects of life, you simply cannot do it all, and in order to not exhaust yourself and feel like a failure, you have to adjust your expectations and set priorities.

On the flight to Lisbon, I miraculously had the lucidity to realize that I’m not a PhD student anymore (it’s like waking up from one of those nightmares about missing an exam), and that maybe I should approach this thing differently? Then I quickly realized, I don’t know how to approach it differently because I hadn’t been approaching conferences with an intention at all. So I quickly jotted down some goals / priorities. My number 1 priority was to enjoy my time there, which meant giving myself the permission to take a timeout anytime I needed, including from other people, and try to maintain at least some of my daily routines at home, like meditating, stretching, and short workouts everyday. I managed to do this on about half the days, and I certainly didn’t enjoy every single day of the conference, but overall, I am way happier about the fact that I didn’t completely get wrecked by the 5 days and managed to keep some semblance of (and also took advantage of some local and conference hotel amenities…). Other than that, I had some “conference science” goals, like learning 3 cool / unexpected things everyday, as well as something directly relevant to my projects. The want for the latter is obvious, but the former is what I actually really enjoy about big conferences, i.e., learning about stuff that I would otherwise never seek out. Section I of the blog post is the result of such attempts. I had roughly similar goals for “networking”, but with the addition of meeting some people that I really get along with but not necessarily have any overlaps scientifically, aka anti-“networking”. I won’t go through all of them, but you get the gist. If I was a starting PhD student, I would probably prioritize seeing as much stuff as possible just to get maximum inspiration about what’s going on in the field. As a postdoc, you have to balance that with optimizing for future opportunities, both in and outside of academia. I’d imagine as an established PI that don’t need too many new directions or opportunities for collaborations, you’d just want to catch up with people that you don’t get to see often, and maybe talk to some journal editors. I was pretty lucky that my poster got rejected and I only had to give a very short intro for the workshop, otherwise all of this is moot because I’d probably be skipping everything to prepare my poster last minute.

Anyway, the point is just that it’s important to balance science, socializing, and rest / self-care, and this is definitely not in the “Science for Dummies” manual. All three are important and can come in different forms, and one naturally wants to prioritize the first two, being at a conference and the first in-person one in ages, but it’s really important for myself to not burn out and just be cranky all the time. At the same time, the fomo is real: it’s hard to say no to science or partying when I just need some alone time, but it’s important to distinguish fomo from “I actually really want to be there”. With some explicitly defined priorities, it turned the same objective experiences from being guilt-inducing stressors to feeling like I’m actually accomplishing stuff on my checklist. Wow I just rediscovered the glass-half-full parabole. At this point, I’m still trying to figure out a mode that works for me. It helped a lot to have defined some concrete goals at the beginning, I didn’t meet all of them, but I made an effort, and it shaped how I consciously approached the thing.

To close, I’m gonna stream of consciousness a bit, and also talk about how strange of a social custom conferences are. Thousands of people show up at the same place, and for the next 5 days or so, everything within this bubble is all that matters. This is obviously a lot of fun and scientifically important, but sooner or later I get this inescapable feeling that I don’t care about much of this stuff, and none of it matters anyway, especially when you zoom out to a larger context of global events, like the ongoing pandemic and a full-on invasion in Ukraine, in addition to the mundane daily things that we deal with in our regular daily lives. I don’t mean to be disparaging to people who can just focus on the science at the conference, in fact, I’m envious, because I came all this way here and there are so many people and learning opportunities around me, why not enjoy it for what it is? But it was also really great to then talk to people who felt very similarly, and we can have a laugh about the quiet absurdity of this whole thing. Maybe I’m not super commited to my science, or science in general, but I feel like when I meet people, I almost never want to talk about science as the first thing. I don’t know why, and it’s really weird because, obviously, it’s a contextually relevant conversation starter. It’s not that I never enjoy talking to people about science, not at all. But I guess it’s a lot easier when I already know them a bit as people.

To push this even more, I think a key objective at conferences is not necessarily to find relevant science, especially in an age where everything is online. In fact, there’s too much stuff online and too little time to consume it all. All the talks are streamed, and most posters are online and curated, with preprints to boot. I think the key is to meet people you enjoy spending time with, who are on the same wavelength and with whom you might then have fun working or talking shop. For me, this often manifests as quiet real talks in a loud place, around some beers, but it’s different for different people—figure out what you enjoy. I also had a great time surfing (aka getting completely bulldozed in a storm) with some strangers, and then got a nice lunch and chat afterwards about completely random life stuff. Of course, as a dude with no real life responsibilities that enjoys drinking beer and getting turnt, I’m pretty lucky that a lot of other scientists also like doing that, so “networking” sometimes happen spontaneously when you end up completely shitfaced at a latenight soup restaurant. As someone who doesn’t enjoy that, you’d probably have to work a bit harder, but a nice lunch and stroll is always possible, especially if you can open up and connect without being lubed by alcohol. I thought the conference app where people self-organized events like climbing, birdwatching, and surfing was a great addition to make the casual networking a bit more equitable.

And with that, onto neuronal timescales!

Section III. Workshop Report: Mechanisms, functions, and methods for diversity of neuronal and network timescales

This whole workshop-planning thing started quite spontaneously: I saw the call on Twitter maybe a week before the deadline, retweeted into the void and tagged Roxana to see if she would be willing to do this crazy thing together, and 4 months later, we’re in Cascais, sitting at the front of the meeting room as co-organizers in front of some dope speakers.

Note: this slide was from the introduction I gave, updated with our two heroic speakers that subbed in on the day of, circled in green.

I never imagined myself organizing a workshop, much less at Cosyne. Nevermind the imposter syndrome, this just feels so…grown up? Luckily, the workshop organization wasn’t too much work after we had written up the proposal, especially with several of the speakers volunteering themselves after seeing the tweet. I don’t know if this makes the selection process more equitable, but it certainly made our lives easier. Nevertheless, we filled the rest of the roster while trying to optimize for diversity in a few different aspects, including gender and career stage, as well as to cover the huge breadth of topics that now make up the “field” of neural timescales.

And holy shit it’s a big wide field.

To provide a bit of context, here was our thought process while deciding on the topics, which became the workshop title:

diversity: we are still amassing data from different brain regions in different model organisms, under different task constraints and from different recording modalities, to see just how timescales vary across space and time. Much of this data attempts to connect with cortical hierarchies (or gradients), as per Murray et al. 2014 (shoutout to the OG), but also to characterize heterogeneity more generally. Then, we chose 3 broad themes: function, mechanism, and methods.
First, function refers to behavioral or “computational” relevance, i.e., why is a diversity or gradient of timescales useful for the organism? The cookie-cutter response that I personally always give is: well, the environment is dynamic and has many temporal hierarchies, so the brain should have the same in order to keep track of them. But this is quite vague, since it doesn’t specify where, how, and what the precise relationship between the two is, e.g., we can remember stuff from many years ago, does that mean there are neuronal dynamics whose timescale is on the order of years, or is there some kind of conversion factor?
Second, what are the biological mechanisms that gives rise to the heterogeneity of timescales? This can include synaptic, cellular, and network factors that all mix together, which can produce unexpected behaviors. Of course, one has to be careful in linking specific mechanisms to specific observations at different spatial resolutions, which is to say, single neuron spike train timescales probably arise from a different mechanism than population-level timescales, which also differ from membrane timescales, all of which are important and potentially (casually) influence one another.
Lastly, and implicit in all the above, are methods that drive these investigations, and that include both analysis and modeling methods. Fundamentally, are we measuring the same (biological) quantity if the algorithms we use to arrive at those final numbers drastically differ? And this is before even considering modality differences (e.g., continuous time series vs. point processes). This is something that’s good to catalogue, even if we cannot hope to be prescriptive in standardizing them. On the other hand, computational models are always a great model organism for investigating mechanisms that are infeasible to dissect in biological organisms, so how do we leverage artificial networks—be it spiking, rate, or deep RNNs—to study these same questions above?

You can probably write a whole book, or at least a 10-page review paper, to cover all the works that touch on the above in the last 10 years alone (see here and here for very relevant discussions). In the workshop, we got a chance to see some of the newest works in the last couple of years. I’m not gonna be modest about it: it was the best workshop I went to that day (but also probably in general). Not only did our speakers cover the incredibly broad spectrum of topics relating to timescales, there were such rich and unexpected intersections between works that it really felt to me like a coherent and self-organizing emergent entity. It’ll be hard to not completely butcher their findings (so feel free to correct), but in the interest of space, I will just briefly summarize the key points from each talk below, as well as some of the discussions and takeaways points we had.

Note that I mostly keep to third person throughout to avoid switching back and forth between the speaker and their team, just for convenience, but in all cases it was acknowledged just how much of a team effort it really was, from lab tech, research assistants, all the way up to the PI.

Summary of talks

Lucas Pinto started the day by blowing my mind with some experimental data that continues to remind me to put some respect on the Experimentalist’s name. In his virtual reality setup, mice have to make a left-or-right decision based on always-present or transient visual information as they run down a virtual track, depending on the specific task. During this, he has the ability to do focal optogenetic inactivation, but cortex-wide. This means that during any part of the mouse’s run, he could turn off any part of the brain in a systematic way to see how different regions contribute to, and what their timescales of involvement are in, e.g., visual perception (seeing the pillars), evidence accumulation (”counting” and remembering the quantity), or decision-making (recalling and acting based on the information). More concretely, the question is: if you shut down a part of the brain momentarily, how much does that screw up the mouse’s performance on the task, and for how long into the future is this deficit present for? Somewhat surprisingly, he showed that shutting off almost any part of the dorsal cortex will induce a performance deficit, suggesting that these “cognitive processes” involve multiple brain regions. However, how much and for how long the deficit lasts for depends on the inactivated brain area. Something about this systematic casual manipulation of the brain to interrogate cognitive faculties really blew me away, because you can start to disentangle, among other things, “when” vs. “for how long”, and this is without recording from a single neuron…but, of course, he also showed wide-field calcium imaging data from many cortical areas and recover a hierarchy of activity timescales. There is so much more, so you can check out the tasks here, and the main findings here, and we even got to see some super cool preliminary data on timescales across cortical layers.
Going from mouse to monkeys, Ana Manea showed us some ultrahigh field fMRI data from the monkey brain and related them to the classical single-unit timescales, as well as to functional connectivity gradients. I’m a huge fan of this type of work that bridges modalities, and her data showed that the spiking timescale hierarchy is preserved in fMRI, though I’m very curious what the explicit scaling factor is and whether that’s consistent between macro regions. The power of fMRI, of course, is that you can now look across the whole brain with high spatial resolution, and she sees a smooth gradient (e.g., across the dorsal visual pathway), and not only across the cortex, but in the striatum as well. This is really nice to see, but if you think about it, it doesn’t have to be this way at all, because the temporal dynamics of single-unit or population spiking could be totally different from that of hemodynamics recorded via fMRI, so I really wonder what drives this consistency. On top of that, she showed that functional connectivity gradients (a hot topic in human resting-state fMRI) are correlated with the timescale gradient, wrapping up a nice story connecting spikes, BOLD, and (functional) anatomy. Her paper with all the details is hot off the press, so check it out (it’s also super interesting to read the open reviews). One thought I had while listening to her talk the following: two autocorrelated signals tend to have a stronger pairwise correlation just by chance, and these functional connectivity gradients are typically taken as the singular vectors of the resting state BOLD covariance matrix. So how much of the functional connectivity can be expected by the signal statistics of the univariate BOLD autocorrelation alone? Beyond fMRI, she also had some spiking data from the “top” of the hierarchy, including frontal and cingulate regions, but I won’t cover that here other than to say I’m very excited to see more spike-LFP timescale comparisons.
From there, Lucas Rudelt, who was thankfully able to tag in for his advisor, Viola Priesemann, continued with the theme of crossing scales. He also threw the first punch, so to say, by introducing the concepts of criticality and scalefreeness. Their works start from a complementary and first-principles approach by positing a general process of activity propagation (i.e., branching process), which models how much influence a given “node” in a network (e.g., a neuron) has on its downstream nodes. This influence can be parameterized by a number called the branching ratio, or in their specific case of a neural network, neural efficacy (memory is a bit fuzzy here but I think it’s the same conceptually). Intuitively, a network with low efficacy does not propagate activity very far, or for very long, resulting in shorter timescales. You might expect the converse to also be true, that networks with high efficacy would have long activity timescales. However, among many of the results Lucas talked about, one surprising observation is that it’s a lot more complicated: when networks are set to an efficacy of near 1 (otherwise known as the critical regime), timescales can very quickly become longer, but are also more variable. In fact, it’s not great if the branching ratio is at 1, because then the propagation would explode, so it’s more reasonable to expect that neuronal networks operate slightly below criticality in order to balance information propagation and stability, and this slightly sub-unity region allows the balance to shift dynamically and flexibly. Furthermore, he makes the distinction of intrinsic timescale with information predictability, and find that timescale increases along the visual cortical hierarchy (in agreement with the original findings in mouse Neuropixel recordings), but information predictability decreases. This is quite puzzling as it contradicts the (more intuitive) notion that longer timescales would translate to higher predictability into the future, since things don’t change as quickly. You can find some examples of their work on this topic here, here, and at Cosyne 2022 poster 3-036.
Brandon Munn further represented the scalefree perspective with a tour-de-force overview of his PhD and postdoc works in Sydney. Well, it was both cross-scale and scalefree, in every sense of those words, as he reviewed some earlier works that span from local spiking analysis (looking at 1/f PSDs!) to macroscale modeling of the neuromodulatory and thalamocortical systems. Of the latter work done jointly with Eli Müller, he presented some really interesting results that linked cortical timescales (of fMRI signals) with matrix vs. core populations in the thalamus. Very broadly speaking, the thalamus has two types of projections to the cortex: “core” populations have precise targets in the cortex, while “matrix” populations have diffuse cortical projections. Among the many thalamus-cortex associations they find, the most topical was that the cortical gradient of timescale correlated with the level of core vs. matrix projections, i.e., regions with more matrix projections have longer timescales. This brings yet another complicating factor to the mechanisms discussion: in addition to single-cell properties and feedforward- vs. recurrent-dominated local connectivity patterns, the thalamic input may also play a direct role in shaping the “intrinsic” timescale—you might ask yourself at this point, what’s intrinsic at all anymore about intrinsic timescales? Just because monkeys and humans were not enough, Brandon also talked about some exciting new analyses he did with whole brain calcium recordings from zebrafish larvae and Neuropixels recordings from mice. Without delving into the details, he used a procedure called coarse-graining to lump more and more neurons together to see if population timescale lengthens with the number of neurons you pool together. Indeed, it does, and at the risk of putting words into his mouth, I think this raises the possibility that long timescales are maintained not by a specific neuronal population (say, from the association areas), but simply by circuits with integrated activity from more neurons. More broadly, this loops back to the idea of scalefreeness, where spatial correlations scale with temporal correlations, i.e., smaller populations, be it neurons or sand grains, sustain events of shorter durations, and vice versa.
To close up the morning session, Roxana Zeraati, my co-host, presented her recent works on a new method for robust estimation of timescales, as well as its application to characterize timescale changes from the monkey brain under attention manipulations. Fun story, I was asked to review her method paper more than a year ago, and that’s how I got to know her and her work in the first place, which led to the idea of co-organizing this workshop. The paper is now published (along with a nice python package, abcTau) so you can just go check it out, but briefly, she tackles the problem of biased estimation of the decay time constant when fitting exponentials to the autocorrelation function, due to various factors such as low spike count, short trial duration, etc. Ideally, we would want both a less biased estimate, as well as a quantification of the uncertainty, especially when the underlying process has multiple timescales. Her approach applies the framework of approximate Bayesian computation (ABC), which has a more modern synonym under simulation-based inference (SBI) (which, funny enough, is what I work on now with Jakob): ABC (or SBI) takes a generative model, runs many simulations with different parameter configurations, and accepts the parameters that successfully generate simulated data that matches the observed data as the “true” generative parameters. Actually, many methods fit this description, including naive brute-force search, and ABC methods essentially cast the problem into a Bayesian setting (i.e., posterior estimation) and do it in a more efficient way. I won’t bore you with the details, and she actually used most of her talk to showcase the application, I just think it’s a nice method and obviously is very related to the stuff I do now. But the empirical findings are just as nice: using abcTau, she was able to parse two timescales from single-unit recordings from monkeys doing a selective attention task. She finds that the fast timescale on the order of 5ms (membrane? synaptic?) does not change with attention demands, but the slower one on the order of 100ms does, and builds a computational model to suggest that between-column interactions across the visual cortex can explain the slow timescale change.

By this point, I was pretty thankful that it’s lunch time. If you’re counting, in these first 5 talks, we’ve had 5 different methods for computing neural timescales and even more model organisms. We also saw that timescales not only vary across the cortical hierarchy in a “static” way, but change across layers and over time (whose rate of change can also change), are related to different potential mechanisms (from connectivity to variation in thalamic projection) and cognitive processes (from decision-making to attention), and, just for fun, could potentially scale in a scalefree manner in the goldilock zone of quasi-critical dynamics—and this is just a tiny summary of the data we’ve seen so far. Good thing we got a nice lunch break together and a quick stroll on the beach, which might have been my favorite part of the workshop (more on this later). But back to the science, and in the afternoon session, we were treated with 4 talks with 4 entirely different kinds of computational models, each of which were used to study similar questions of heterogeneity, function, and mechanisms.

So far, we’ve talked about neural timescales as useful for implicitly tracking timescales in the environment. Manuel Beiran started the afternoon by making this proposed function very explicit, investigating the conditions and mechanisms that allow rate RNNs to not only learn examples of durations, but to generalize to unseen ones. The task is straightfoward: a network receives an input that encodes the intended interval (either via an amplitude, or a delay between two pulses), and after a Go-signal, is asked to produce a ramping output for just as long. Unsurprisingly, RNNs can learn the examples fine, and could even learn to interpolate across unseen durations within the bounds of the training examples. However, networks with full-rank connectivity have a hard time extrapolating, whereas networks whose recurrent weight matrix is low-rank could extrapolate (with some help from a context cue). Looking at the dimensionality of the network dynamics, it appears that the low-rank networks essentially keep to a low-dimensional manifold (!!) whose geometry is fixed, but the speed at which the dynamics unfold along the manifold changes for the different durations. Relating back to the overall theme, Manuel’s talk suggests that connectivity constraints could be a useful thing for networks to be flexible and generalizable in tracking time(scales), instead of falling into tailored solution for individual timescales. Note that the full-rank networks probably can find the same solutions, since they are a superset of the low-rank networks, but the latter have an easier time reaching these solutions (for whatever reason). The preprint has all the information plus more. I also have to mention Manuel’s older paper on disentangling the contribution of adaptation vs. synaptic filtering to network timescales, which I only learned about recently, but has some quite nice (and surprising) results.
Going from rate to spiking networks, Alex van Meegen delved further into mechanisms by providing a theory to predict single neuron timescales. I’m not gonna lie, there was a lot of heavy duty math that I basically have no hopes of understanding, probably ever. But Alex framed his talk quite intuitively, and the question he poses is deceptively straightforward: can we predict the timescale of single neuron spiketrains given neuronal parameters and network connectivity, as well as the temporal statistics of the external input? In particular, he highlighted the question of how neurons embedded in a network can acquire much longer timescales than set by their membrane time constants. The contribution of the work is that, instead of brute-forcing it numerically, he worked out a theory that analytically connects underlying parameters to observables. If I understand correctly, the dynamical mean-field theory “squishs” the network of neurons into one “big neuron” whose distribution of input and output statistics (e.g., mean firing rate, ISI, and autocorrelation) can be described by stochastic differential equations, and squishing them this way is okay because the input and output are “self-consistent” (I’ll stop here before I embarrass myself more). He applies it to several different types of neuronal models, including generalized linear models with various nonlinearities, as well as leaky integrate-and-fire neurons with different connectivity structures, and finds good agreement with simulations. All the findings are in the recently published paper, which he did not have time to fully cover in a 20-min talk. One point he stressed was that the theory only predicts the average timescale (and distribution of quantities) across all the neurons in the population, not the timescale of the average population activity, which are markedly different (e.g., Fig. 10 e vs. f in the paper). I found this particularly interesting because the former corresponds to single-neuron timescales measured via spiketrain autocorrelations (e.g., in Murray et al., 2014), while the latter resembles something more like the LFP, or more directly, summed pre-synaptic input into the neuron. But these two quantities are suppose to be self-consistent, so the synaptic or neuronal membrane filter is doing a ton of work to make this happen?
Nicolas Perez-Nieves, heroically stepping in remotely an hour before his supervisor, Dan Goodman (who seemed to have had just the worst luck in the days leading up to Cosyne), was schedule to talk, took yet another complementary modeling approach. In his work, he trains spiking neural networks to do a range of classification tasks with temporal structures—a difficult feat in of itself that requires a clever surrogate gradient descent technique (but check Dan’s Cosyne tutorial!). Typically, training such networks, whether spiking or rate-based, means adjusting only the connection weights in order to optimize for task performance, but not here. Nicolas asks whether heterogeneity in single-neuron parameters, specifically their membrane time constants, can improve task performance. In this context, heterogeneity means that every neuron in the recurrent network is allowed to take on a different value for its time constant, instead of all being the same. This heterogeneity is implemented either through initialization alone (and then fixed), through learning, i.e., the neuron-to-neuron weights, along with single-neuron time constants, are learned through back-propagating the task performance loss, or both. This technically sophisticated (and apparently computationally intensive) setup lead to some very intuitively satisfying results: learned heterogeneity in single neuron time constants led to better performance in all the tasks, and their distribution roughly matches experimentally observed gamma distributions of membrane time constants measured in real neurons. In addition, a small number of neurons seem to consistently acquire very long timescales, e.g., 100ms compared to the median of ~10ms. There are a lot of interesting things to consider when extrapolating from these results to the real brain. For example, neurons in the brain probably don’t (and can’t) tune their membrane time constants for each task they face, but the heterogeneity already exists, or at least changes on an evolutionary timescale—so then how is this heterogeneity taken advantage of per task? Does this help explain functional specialization of different brain areas, since neuronal (and network) properties in different regions are more or less defined after early development? They touch on both points (and more) in the recently published paper.
Last, but not least, Vy Vo and Shailee Jain gave a jointly pre-recorded talk about a set of very fascinating and interdisciplinary works they did on augmenting machine learning-style RNNs (i.e., LSTM) (with timescales!) to better perform language tasks, and then using them as a model of natural language processing in the brain. They start from the observation that natural languages, like many other processes in our environment, has a hierarchy of timescales (e.g., phonemes to words to sentences). RNN-based language models usually use a gated “neuron”, like the long short-term memory (LSTM) unit, which has an internal memory that decays with some time constant, which is useful for capturing long-range dependencies. Because I’m learning the German language, it’s the only example I can think of, where a question usually has the verb—a critical piece of information—at the very end. To parse a question, you would then need to “hold onto” information, like the subject and object and wheres and whens, until the very end, when it’s clear what the requested action is. In the first part, they show that when LSTM units are assigned a distribution of timescales mimicking that of natural language (i.e., power law), these “multiscale” networks result in better language modeling performance, especially for rarer words. Pretty cool already, but it gets better: they can now give the same text input to the network as what a human reads inside a MRI scanner, and compare network activation to brain activity. Specifically, they try to predict BOLD timecourse at each voxel in the human brain using a weighted combination of the activation of all the units in the RNN. Surprisingly (at least to me), they find that activations of voxels in the temporal lobe (language-y bits?), prefrontal cortex (think-y bits?), and the precuneus can be predicted particularly well. What’s more, they extract an “effective timescale” for each voxel by taking a weighted average of the timescale of the units in the RNN: if a unit is particularly predictive of the activity in a voxel, that unit’s timescale would be weighted high. Through this procedure, they find that the auditory cortex has short timescales, which increases as you move towards parietal areas (TPJ), and there’s a similar gradient along posterior-to-anterior PFC. Some natural questions arise here: how do the RNN-derived voxel timescales compare to those computed based on the BOLD ACF (like in the previous talks), and specifically, how are they different, and also how does the resting state timescale compare with these task-derived ones? Anyway, I think this is a super cool intersection of machine learning and neuroscience research (as well as collaboration between academic and industry labs), where the ML model actually benefited from a brain-inspired architecture while also assisting in explaining brain data.

A lot of questions moving forward

I think my brain was properly scrambled, or rather, has exploded and smeared the workshop meeting room after that whole day of talks (and I had said about as much in our final discussion). It’s the feeling of suddenly experiencing so many new things that one has a tough time keeping track of any single discussion, let alone how they intersect with one another. Fortunately, it seemed like I was not the only one, and people mostly shared the sentiment that this was a good thing: that the talks and discussions at the workshop generated so many new leads and potentially new perspectives that it’s hard to consolidate them into a coherent sequence of thoughts. In the two weeks following, I was able to take some time to digest these new ideas, and going over my notes for all the talks (and their associated papers) for this blog post really helped in picking up the pieces (of my brain) after the dust settled, and I find them falling into the same themes we’ve set out in the beginning. I hereby present you the pieces of my brain, which, of course, are heavily inspired by the workshop speakers and attendees, and in particular recent conversations with Matteo Saponati and Alana Darcher:

methods for measuring timescales: implicit but central in all the discussions we’ve had is a reliance on faithful measurements of timescale, and we already had a very concrete demonstration (in Roxana’s talk) about how naive loss-minimizing estimates from fitting an exponential decay function to the autocorrelation may be biased, as well as how more sophisticated methods (or fitting multiple timescales) could lead to different and novel interpretations of the same data. This is a huge issue, because if the ruler you’re using to measure stuff is messed up, then any conclusions you draw about whatever you’re measuring is bound to be messed up. This much is obvious. But what’s not obvious is the extent to which this is a “problem”, and if so, how to fix it. Like I said before, different data modalities (spike times vs. continuous voltage recordings) has a different set of challenges for doing this right, even if we keep to the same approach of estimating an empirical autocorrelation function and then fitting a (or several) decay time constant(s). In this particular case, there’s probably enough statistical theory to guide unbiased measurements, but it’s the wild wild west when you start folding in other metrics like the integral under the ACF, full-width half maximum, or time-to-first-zero-crossing, let alone more complex measurements based on, e.g., delayed mutual information. It’s likely that these metrics will partially overlap in what they’re trying to measure, and it’s a certainty that they will give different answers when applied to the same data. The question is, do we need a set of guidelines for what to use when, a “dictionary” that maps the relationship between these metrics under different scenarios (empirical or analytically derived), or otherwise some kind of sensitivity or surrogate analysis procedure that quantifies how differently the conclusion could be. Further complications arise when you have to make model decisions without knowing the underlying generative process, e.g., whether you fit a cosine term to the ACF will depend on whether you think there’s an underlying oscillation. I don’t know what the right answer is, but I feel like this should be priority 1A before we accumulate too much literature based on shaky timescale estimates without good error bounds. A related issue is just being aware of what exactly is the process whose timescale you’re measuring. Spikes are different from LFPs are different from calcium dynamics are different from fMRI BOLD signals, and it would be great if we can avoid using “neural timescales” as some nebulous catch-all thing to loosely support theories at vastly different scales without specifying the plausible bounds of these modalities.
static or dynamic, when and where: when I first got interested in timescales, it was kind of “obvious” in some sense what to expect, namely, cortical timescales are static properties of neurons (or neural populations), and should follow a cortical hierarchy that increases from sensory to association areas (minus the functionally dynamic bit…). This had enough empirical evidence from several (sparsely sampled) single-unit studies, as well as task fMRI studies, but more convincingly (for better or worse), agreed with a simple and intuitive idea. But in the last few years, even the idea of cortical hierarchy has evolved significantly, partly driven by our ability to measure large-scale cortical gradients in many modalities, including structural and dynamical brain variables. I think the timescales gradient concept still holds, but is likely to be the zeroeth-order approximation to reality. That was a long winded way to say, at this point, I have no idea nor ideological commitment to where, when, and how neural timescales change, whether the underlying covariate is spatial location, (interally generated) behavioral state, task-associated demands, or some other thing. Well, I guess what I can say for certain, especially after the workshop, is that timescales of anything is likely to be dynamic than static, especially when you throw enough different contextual covariates in the bag. Is this surprising? Not really, because the brain is a non-stationary dynamical system that has to somehow reflect, process, or otherwise be coupled to another non-stationary dynamical system that is our world, and static timescales would pose quite a limitation in its representational capacity. A more fruitful question might be: under which scenarios are timescales static (or invariant)? In other words, if timescale is the ruler, then which things—and things can include any abstract quantity such as brain regions, cell-types, or tasks—are the “same”, and by extension, which things are “similar”? This is simultaneously a trivial and a complex question: trivial because if you take the perspective that timescale is just another characterization (or summary statistic) of neural dynamics, then we already know what to do with it. Afterall, systems neuroscience has lots of such measurables. Take the simplest one: stimulus-evoked firing rate. When a neuron increases in firing rate, we say that it’s tuned to (properties of) the stimulus that triggered that response, and this tuning is often graded. From this perspective, we can churn through our normal pipeline, and the only difference is, instead of characterizing how strongly a neuron is responding to something—which has an intuitive though arguably questionable interpretation—we are talking about how long it’s responding to something. You can even go second-order and characterize how timescales change, or its rate of change. As long as we can be relatively certain that our timescale estimates are good, we can do this ad-nauseum and still provide scientific value by gathering a collection of such data. But as much as I’d like “timescales” to be the 2020s equivalent of early-2000s-fMRI, the complex part of that question is: wtf does it actually mean?
linking measurements to theories: well, wtf does it actually mean? In other words, having gathered all this data about how timescales vary under which circumstances, what do we learn about the brain? My default philosophical response is: what does measuring firing rates teach us about the brain? My default cynical response is: we just need a catchy one-liner theory about what timescales “represent” for it to take off. But seriously, what use is it? Well, with my small ideological bias, I think measuring the time constant of dynamics available to a dynamical system is a necessary step in characterizing and understanding this dynamical system, but we don’t even really have to drink the dynamical systems Kool-Aid. If you stick with the broader perspective of system identification, we can cross reference a much older body of literature that looks at neural systems as input-output transformations. In that framework, looking at the timescale (of the impulse response, for example) is as natural as characterizing its resonance frequency or spatiotemporal filter, and I’m almost certain Walter Freeman III or somebody has done this 30+ years ago. My secret agenda is to take computational neuroscience back to the golden ages of FFTs and linear systems analysis, but if you want to go the modern route, here’s something for you too: computation-through-dynamics requires, well, dynamics, and the characteristic timecourse of a particle traversing through the neural population state space is probably quite important. I mean I didn’t come up with this just now, there is a body of literature on this as well (e.g., Runyan et al., 2017), and it’s also something that some of our workshop speakers are actively looking into. For a much more eloquent and broad coverage of specific examples related to this issue, please read this great paper
mechanisms: biological and otherwise: my personal philosophy here is that timescales is a readout (or observable) of a system whose latent (and biologically mechanistic) variables are the quantities we’re truly interested in. What does this mean, exactly? Practically, it means using the more nebulous “neural timescales” measurements—specifically, the decay constant of spike train and LFP autocorrelations—to understand parameters of the neural circuit that you cannot measure. This can include time-related quantities such as membrane timescales, synaptic timescales, and timescales of other processes like adaptation driven by a combination of physical processes (like calcium fluctuation), but also network topologies, ratios of cells with different timescales, etc. Again, I’m not saying this is the right way to think about it, I just personally tend to gravitate towards these measurables as “physiologically interpretable biomarkers”, because it’s easy to imagine how such indirect inferences (when valid) can be of great value downstream, both scientifically and clinically. One could also ask what are the mechanisms that gives rise to the timescale observations we have, which is a different but very much related question. More strictly speaking, if we want mechanisms, an observable like neural timescales should be used to constrain (or even more explicitly, rule out) theories about plausible biological mechanisms, i.e., the timescales we measure in the brain are what they are, but which of the hypothesized mechanisms are implausible in generating these observations? This is a much broader point about science, and it’s quite a big ask, but in the ideal case, seeing observations that are plausible under one hypothesis does not (and cannot) confirm that hypothesis, even though that’s what we (myself very much included) often do. That aside, we can also use timescales to constrain non-biological hypotheses, and I’m thinking along the lines of information propagation models that Lucas and Brandon talked about. Going this route, we can sketch a model of the brain that is totally non-biological (I guess you can say a model at the algorithmic or computational level, if you want), but nevertheless use timescales and any other readout you can get your hands on to constrain the model, or plausible parameter regimes of the model. As an example, if we model spike propagation as a branching process, are our observations consistent with a system near criticality? Does it rule out sub- or supracritical regimes? From that perspective, timescales are not any more special than any other readout of the circuit, certainly not epistemologically (looking at you, firing rate), and by analogy, it could be as simple as the higher moments of a distribution: when mean and variance are not enough, maybe the skew would be useful in distinguishing between two hypothesized distributions, that’s all.
consequences (and the f-word): the f-word here being function, which I would like to avoid because it presupposes some kind of intention (but it’s in the workshop title, I know…). It’s possibly (probably) an inconsequential semantic debate, I acknowledge this, but it’s my blog, and if you don’t like it, you can function-off (just kidding). I am happy to talk about consequences, however, both in terms of biology, as well as computations and behaviors. We saw many examples in the workshop where neural timescales could potentially implement some computation (the most explicit one being Manuel’s timing networks), and it’s easy to imagine hypothetical scenarios where certain autocorrelation structure in the spike train might selectively activate downstream neuronal or network processes that are “tuned” to a timescale (think spike-timing dependent plasticity). Actually, correlated neural code is a thing, so why not autocorrelated neural code (credits to Matteo for co-coining the term, though it’s possibly a re-invention). Here, the difficult part is determining whether a particular observation about timescales is necessary to induce the downstream consequence, or whether it’s merely a reflection of something else. As a simple concrete example, are single neuron spiketrain timescales important given the population spiketrain timescale? In other words, it’s difficult to isolate a single aspect of the spike train statistics (or LFP, or whatever), be it firing rate or timescales, while holding all else constant, and argue that that was really the crucial ingredient. This strict counterfactual requirement is basically impossible to realize in-vivo, so we try to make inference from evidence in the reverse direction, which is less powerful but obviously a lot more feasible, i.e., perturbing behavior through task design and seeing how it affects timescale measurements, and in stronger cases, even perturbing the brain directly (like Lucas’ experiments in the first talk). I can sit here and write for the next 3 hours potential ways how cognitive and behavioral variables could be correlated to or be affected by neural timescales. Many of those will be proven wrong, but a small subset of a lot is still a lot, and I’m not sure if that’s the regime we should head towards, i.e., correlate timescale changes with everything under the sun, as we’re prone to do (oscillations and fMRI are just two examples that come to mind). At the same time, I don’t have a better idea of what to do, other than perhaps supplementing behavioral experiments with building explicit dynamical models of computation, in which one can control neural timescales (whichever measure you choose) while holding all other constant…
understanding through building: …which brings me to the point of computational models. Maybe it’s the engineer in me, but I really enjoyed all the different flavors of models in the afternoon session of the workshop, perhaps as a result of the sense of control they give us. Models give us the ability to operationalize these hypothetical counterfactuals in a straightforward way, i.e., if we have a model that performs some computation through its dynamics, then does (for example) the presence of heterogeneous timescales matter? In both Nicolas’ spiking networks, as well as Vy and Shailee’s LSTMs, the answer seems to be a resounding yes, since one can simply setup the alternative by giving the units homogeneous membrane timescales. Does that mean a spiking network or LSTM cannot do the task equally well without heterogeneous timescales? No. But we could potentially say something like, given the same learning mechanism or even the same learned connectivity, one architecture is superior than the other in the context of a specific task. Of course, when possible, one should also give the model other mechanisms that can plausibly result in similar computational benefits. Even though this is not necessarily always the practice in modeling, I think it could lead to some very interesting research questions, i.e., are there degenerate mechanisms that can complement or replace heterogeneous timescales in making network dynamics and computations robust? A further difficulty, in this particular case, is that constraining the timescale parameters of the model is easy, even if it’s biologically unrealistic, because they’re often explicitly parameterized (like, it’s an array you put in). However, constraining the timescales of the outputs of the model, which is the quantity we can observe in our experiments, is more difficult. This would be easier if we knew which model parameters affects which aspects of model output, but that takes us back full-circle to the question of mechanisms. Regardless, it’s important to differentiate the timescales of the inputs, the model parameters, and the model’s output, and be vigilant about possibly confusing one another (I know I’ve made this mistake several times).
scalefull or scalefree; exponential or power law?: as an 1/f afficionado, the final point I want to explicitly bring up is this tension between characteristic timescales and scalefree/self-organized criticality theories of the brain. I’m not sure if I’m hitting a wall where there is none, and it’s not really a well-defined question, but I have a funny feeling in my stomach when I think about harmonizing these two conceptual frameworks (see this review paper for a very nice treatment of this discussion): on the one hand, exponential decay is a very common phenomenon in nature. Or, rather, many natural processes are well approximated by an exponential decay, that being the stable solution to a system of linear differential equations. On the other hand, many processes in nature are fractal, or scalefree, and can be captured by power law distributions of sizes, durations, or energy, while the generative model that produces power law observatons—systems at or near criticality—is theoretically attractive for many reasons, such as the balance between sparsity and robustness. Are these two perspectives at odds with each other? Not necessarily, as one could simply take a power law distribution of timescales to realize scalefreeness (similar to the multi-timescale LSTMs), which amounts to stacking up a lot of exponential decays of various time constants. While this is one plausible explanation, and it works in parameterizing the system in the way we want independently of the potential mechanisms that give rise to it, it feels a little “engineered” or contrived as a theory. Afterall, who’s behind the scenes setting up such a nice distribution? Alternatively, there are theories (mainly from statistical physics and the complex systems tradition) that could more parsimoniously explain the power law observations, by which I mean with fewer hand-tunable degrees of freedom in the generative model. But in that realm, my impression is that people often use power law distribution (of, for example, timescales) as evidence of their claim that the system is at criticality, which would be circular if we then used that as an explanation. Lucas’ generative model and Brandon’s coarse-graining analysis came together nicely to give this perspective some weight, but I wonder how it would connect with the rest of the (more vanilla, if you will) observations? Maybe our theories should specify when and where the brain is at criticality, and when it is, what kind of timescale (and other) observations we should expect? How many neurons should we record to have confidence in falsifying the theory one way or another? But more importantly, do any of the entities we measure, like a neuron, actually have a “characteristic timescale”? Sure, the RC-circuit equations capture single neuron membrane dynamics well, from which a time constant just falls out, but perhaps the multitude of network mechanisms then ensure that the neuron never “operates” at its fixed timescale?

Some reflections as a participant and co-organizer

Alright, I think I’m just throwing out wild thoughts at this point, and I’ve covered most of the scientific points I wanted to cover. If you have thoughts on any of these issues, I would be very happy to hear about them. I just want to end this blog post with some final thoughts about the workshop experience.

Hands down, our workshop was the best part of Cosyne for me. It kinda started out as a joke, but I truly believed it by the end. It’s not that the main conference experience was bad by any means, it was just…looser. The talks were interesting, and so were the interactions at the posters, but there wasn’t a narrow thematic constraint. This is obviously by design, and in some sense I’m stating the obvious: broadly interesting conference is broad. Wow, much insight dude.

Yes, it is obvious, and I would not be saying this if not for the fact that the workshops, as a whole, were equally diverse in topics, and one does not have to sit in the same room the entire day. In fact, in my previous years at the Cosyne workshops, I’ve always hopped from one room to another constantly trying to see talks that were maximally overlapping with my interest, or was the hot topic du jour. Obviously, I couldn’t do that this year as a co-organizer sitting in the front, but I think it actually worked out for the better. Perhaps this is also obvious, but I think the workshops is truly a scenario where the whole is more than the sum of its parts. When you see individual talks, you hear about the specifics of the experimental design or model implementation, and some background information to motivate the work. When you see a whole set of talks on the same topic, even if some of those talks lie squarely outside your domain of interest and expertise, you still get to see much more clearly the things that were in common, i.e., the shared motivation and context, the things that are possible, as well as what’s not talked about, i.e., gaps in thinking or even opportunities for future research. Clearly, the panel of speakers did not include everyone in the world working on timescales. But if your sample contains 10 people that come from vastly different perspectives, and 8 of them mention the same stuff or acknowledge the same issues, or, maybe are even puzzled by the same observations and questions, then there’s something interesting going on. Furthermore, the fact that all the talks are on the same day makes it just much more likely that interesting stuff will get into your head by osmosis (at least that was my experience). I would say that attending the entirety of a workshop is the most efficient way to get familiarized with a subfield or topic as a novice, except that wouldn’t even be true: I think even “experts”—actually, especially experts—would benefit greatly from seeing a familiar issue from a different perspective. I definitely don’t consider myself an expert, but I learned something new from basically every talk, and I don’t just mean a piece of scientific finding, but more so a new way of thinking about things, timescales and beyond. I suppose this could partially be due to our explicit intention in sampling a diverse range of perspectives when selecting the speakers, but my guess is that even if you sat through 2 days of people going through competitive models in the BrainScore workshop (no shade), you’d still learn something about science on the meta-level.

On top of that, some things I had to do as a co-organizer ended up really adding to the experience. Beforehand, Brad gave me some really helpful advice that, as an organizer, I should take notes, always prepare a few questions for the speaker in case nobody else asks questions, and to otherwise appear engaged. These are all sensible and obvious things to do, and I probably would have tried to do them anyway if for nothing other than the optics, but to no one’s surprise, appearing engaged (and not being on my phone) actually makes you engaged, and when you’re engaged, you take away a lot more. In addition, when you then try to prepare questions that are not completely superficial, it requires actively listening to the talk and thinking beyond what the speakers have presented. Obviously, nobody prevents you from doing this even if you’re not an organizer, but this is extremely difficult to sustain throughout the whole day, and if I didn’t have to, I probably wouldn’t have, just because it’s quite easy to slip into “well let me just check Twitter for a second”. Actually, as an audience, I feel like sometimes you don’t want to be that person that has a question at the end of every single talk, which is exactly who you want to be as the organizer. Another thing that helped was that Roxana and I had to come up with good questions or prompts for the two discussion periods, and that required another round of active thinking and bouncing ideas off of each other, as well as generally trying to tie everything together. Thankfully, this brain expenditure happened against the backdrop of beach and sun.

One last thing I want to mention, and probably the most important: being a part of the same workshop for the entire day gives you a chance to interact with the speakers and other participant in a way that’s more natural than asking a question at the end of a talk, at least to me. I’m probably not in the minority when I say that I feel very awkward approaching somebody I don’t know to have a conversation at a conference, even more so when it’s not about the immediate scientific subject at hand. At the same time, the conversations I enjoyed the most were not really about the talk we just heard, or even timescales, but about life, family, career in academia, the war in Ukraine, film, surf, or random tidbits of culture, be it about Canada, California, or Tubingen. For me, there’s really just no good way to transition from “I have a follow up question about something you mentioned in your talk” to, well, I don’t know what, but something unrelated and spontaneous. When you hop in and out of workshops, you’re one of many faces that somebody sees. When you stay at a workshop and repeatedly ask questions or engage in the discussion, you become somebody I know. I realize much of this is due to the fact that I was in the privileged position of a co-organizer, it was within our roles to mingle with speakers, as well as get everyone to meet and talk to each other. Because of this, we had the opportunity to ask everyone to have lunch together, so we could at least get to know each other a bit outside of the “classroom”, so to say. I really enjoyed this casual interaction, as well as dinner and drinks, and follow-up conversations I had with them afterwards. I think we could’ve done a better job in involving the non-speaker participants (who seemed clearly engaged and interested) in the more casual follow-up interactions, like going out for dinner, so I think I will be mindful of that if I was in such a position again, both for professional networking and just connecting potentially like-minded people.

Anyway, I obviously realize that not everyone will have an opportunity to be accepted as a workshop organizer, and I don’t want to necessarily prescribe this approach for anyone. But I think from now on, as a participant, I will try to pick a single workshop for the entire day, and just stay there and learn as much as possible.

Alright, that’s all for this extremely long report that nobody asked for. Until next time!

What I learned as a PhD application evaluator.

2021-12-23T00:00:00+00:00

Last month, I served as one of the PhD application evaluators for my village’s local graduate school (IMPRS), and I took some notes while reading through the applications on what I looked for, in real time. It’s probably a bit late for this year’s application cycle to be immediately helpful, but I figured it’s already written so why not share them, so I fleshed it out a bit into a full post.

I had helped screen late-stage candidates a few times before, like giving internal evaluations for the handful of people being considered for our lab, etc., but this was the first time I formally served as an at-scale pre-screening evaluator, i.e., reading large batches of generic applications targeting the graduate program. This a very different thing, and it was an interesting process to slowly realize which qualities in the package I was looking for—explicitly and implicitly—and therefore what made a great PhD candidate in my mind (right or wrong).

There’s a good chance that I get canceled for a) spilling our ivory tower secrets, and/or b) being unreasonable one way or another in the evaluation process. Still, I’d be happy to hear thoughts about any of this: I’m curious to see how this fits with other people’s experiences in general, and especially if you find some part of this unreasonable / unfair / unrealistic.

TL;DR? I think the most useful takeaway from all this, aside from the specific advice about the components, is to put yourself in the shoes of the evaluator and optimize your application to make their life easier. This is not the interview: this is the stage before the interview, and has a different set of things one should optimize for—namely, to get to the interview.

Brief context on graduate school applications and intercontinental differences

About the specific situation: the IMPRS graduate program accepts PhD students from all over the world into Uni Tuebingen, Stuttgart, and their adjacent MPIs, and I think it gets a couple hundred applications targeting the program itself, not a specific advisor. This big batch was split between lots of evaluators, and each application was assessed by 2-3 people, so I had about 20 or so to look through and give a grade between A to C. This was the first stage of the process, and the applications that successfully passed this prescreening stage (A-B grade across the evaluators) move on to be evaluated by their potential PhD advisors, and contacted for phone / video interviews. It was similar for UCSD Neuroscience as far as I recall, which was another big program that gets a lot of applications, and I think there the current PhD students actually help in evaluating, while UCSD CogSci doesn’t have this big pre-screen stage that was outsourced to people other than the faculties themselves. What’s also pretty universal are the components of the application package: motivation / personal statement, CV, transcript, standardized test scores, and recommendation letters.

Some key differences that’s specific to the geography: being a European program, most of the candidates I evaluated here have already completed, or are on their way to completing a Master’s degree, in comparison to the direct-to-PhD route from Bachelor’s that’s more common in the U.S. So this changes what would be reasonable for an evaluator to expect, e.g., Master’s student already with conference presentations or even papers, though this also depends strongly on the field (biological neuroscience vs. machine learning). Also, PhD positions in Germany (and most of Europe?) are actually not usually affiliated with a “graduate program”, and are often advertised (more or less) like a regular job post. So it’s totally possible to have accepted a PhD offer with one of the faculty members affiliated with IMPRS, but not be accepted into the program. This is not usually possible in North America, as you’d almost always start a PhD as a part of the graduate school of a university, and begin with your cohort in September. This creates a sort of strange scenario in the case of European “graduate schools”, and certainly in this particular situation, where some applicants have already started their PhD with an affiliated advisor, while other “external” applicants are applying blind to compete for “real” spots in the graduate school. Although, this also implicitly happens a lot in the U.S., where a current lab tech or research assistant applies formally to continue with the same lab for their PhD, and if the advisor agrees, then the rest of the application process is pretty much a done deal (unless they really bomb the interview for whatever reason). In any case, this post is about getting through the batch pre-screening process of a graduate school, and is therefore, strangely enough, more applicable to the average North American PhD application than a European one.

“Summary”: consider your target audience & general strategies

One note to put you in the shoes of the evaluator (me!), and probably the most helpful: I think for most people doing app reviews, they don’t consider this a part of their real job aka research, but obviously still want to (I hope) do a good and fair job since it tangibly impacts a student’s academic future, especially if they are from a situation or country where good opportunities like these are rarer. This probably means reading through applications on the weekend or at night, in between real life, so the cost function is very much weighed by a personal time constraint, and if one is assigned only 20 applications (out of, say, 300), that’s about 3 hours at 10 minutes per application. Practically, this translates to “make a good decision as quickly as possible”, and not “be absolutely certain that the decision was the right one”. I don’t think this is necessarily “optimal”, but it’s the way it is, and multiple evaluators are there to mitigate randomness in the process.

Given this, there is a very complicated decision-making sequence in my head, a lot of that having to do with an internal evaluation of whether a thought was a correct “first impression”, or an implicit and systematic bias (an aside: kudos to the coordinators for requiring the evaluators to watch these videos on implicit bias - my biases definitely don’t get fixed with a 15-min video, but at least it primes me for this train of thought). To explain this in a much nerdier way, it’s like a drift-diffusion process where I’m shooting for one of the two boundaries: clear accept (A) or clear reject (C), and if I’ve read through the whole thing without a clear opinion, it’s a maybe (~B). I think in reality, I actually had a drift bias towards the positive boundary, meaning I’m usually moving through one component of the application package after another looking for evidence that would support a positive decision, and the final evaluation is “how long did it take for me to reach a robust positive evaluation?”

What does this mean for you, the applicant? Present information that would bias a strong positive decision as quickly and concisely as possible, and clearly motivate (i.e. explain away) why factors or potential red flags that would induce a strong negative decision should be disregarded. Remember that you are one of 300 people applying, and that, more importantly, you can’t change your grades or publication record in the week that you put together your application package, so all that stuff is set in stone. Just work with what you have, and make it very easy for the evaluator to reach a positive decision, i.e., within the first couple of paragraphs of your personal statement. I’m sure this logic applies in many scenarios where pre-screening at mass happens, i.e., HR at Google going through resumes or whatever. If you know what the evaluator’s cost functions are, then it’s simple to target those directly, but that’s often a very black-box decision process, and even more frequently just implicit and idiosyncratic, so candidates often run into the risk of barfing out everything they have to say about themselves and thereby putting emphasis on nothing.

So, part of what I want to do here is to make those targets explicit for those not privy to the ivory tower word-of-mouth wisdom, at the risk of divulging the potential fact that I’m actually a huge racist and elitist bigot or something. Finally, in my opinion, it’s much better to present yourself concisely and accurately, and risk a rejection due to a lack of mutual fit (and try to not take it too personally), than to be vague and broad to get an acceptance only to later find out that it’s not a good fit.

All that being said, the rest is organized as such: 0) qualities of a capable PhD applicant I’m looking for (to confirm) in the components of the application, and then, in order of priority (for me), they are: 1) recommendation letters, 2) personal statement / CV, 3) grades and standardized test scores. They are ordered this way because a recommendation has a high upside potential, but is at worst neutral if it’s not the most enthusiastic reference (very few people will straight up say this person is incompetent). Compared to the grades, which has low upside potential because mostly everyone applying for grad school has good enough grades, and could potentially tank you if you had really bad grades, but if someone didn’t get good letters, then at least really good grades will earn them a consideration. Pretty sure that’s how I got into grad school in the end.

Again, I feel like I have to repeatedly state this caveat, which is that all of this is very idiosyncratic, and I’m really not even sure how differently different people treat this process, so take it with a bucket of salt before you apply any of the “advice” here. The one applicable meta-advice is probably to scope out what people at the program you’re applying to are roughly looking for (just like any other cover letter you write in an application).

0. Qualities of a PhD-ready candidate

I’m a year out of my own PhD, wtf do I know about picking good PhD students? Probably not much, to be honest. At the same time, I’m the schmuck reading your application, and two thirds of the other evaluators are schmucks just like me (postdocs or even senior PhD students), so this is what you have to work with—welcome to a supply-rich labor market. I’m not saying these are the qualities that will necessarily enable a successful and timely PhD—many things outside of your control can often play a stronger role—and obviously much of this is idiosyncratic to me and my personal experience, as well as those of my friends and students I’ve worked with, and just as importantly, is a function of my (somewhat cynical) view of academia. To really flesh this out, it will take a whole other blog post (that I plan to write), but for the time being and considering the huge caveats I just mentioned, these are some qualities I looked for in the applications / candidates, and I’m very curious if these are completely off the mark. It also has to be taken into account that, when I started my PhD, I don’t feel like I could have demonstrably proven any of these qualities in myself, certainly not by my research output during my Bachelor’s degree. But that’s more just a generic comment about evaluations: there will be false negatives no matter how tight your criteria are, because an evaluation of the present sometimes has no correlation with the future, and I’m thankful somebody took a risk on me.

So, here’s the list, with little to no explanation because that’s for another day. Note that these are mostly “traits” that are harder to teach, compared to, say, linear algebra or programming:

perseverance (self-explanatory)
ability to work independently, and alone (again, self-explanatory)
resourcefulness in learning (how to use Google)
ability to work (or at least co-exist) with other people
critical thinking and maturity in acknowledging limits
interest / passion (in pretty much anything)
integrity and responsibility (especially when you fuck up)

There are some other important qualities to consider, but clear contradicting evidence in any of these above traits would be a red flag for me. Again, I am in no way claiming these are “the traits” one must have, this is just my list, and every one of the evaluators has one (implicitly or explicitly).

1. Recommendation letters

One liner: make sure the people that are writing letters for you are dependable and will at least make an effort in demonstrating that they personally know you - better to ask someone that know you well (but is a little less famous or further along in their career) than a big shot who will generically describe you like a number in a classroom.

Not really a good sign that the first thing I look at is the thing where, for the most part, the applicant has no active role in shaping in the short term. But this, by and large, is for me the fastest way to confirm that a person is ready to start a PhD. Basically, if someone that you’ve worked with and has a PhD themselves says that you have shown evidence of being ready for PhD-level research, that’s about as good as it gets. Best case scenario, I read through the two or three letters and smash that A-grade as long as there are no obvious and unexplained red flags in your personal statement or grades (like addressing the wrong graduate school or a B in statistics while applying to do a machine learning PhD). Then I’m pretty happy because we can quickly move on to the next application. But most of the other cases require a more detailed consideration of the components in the following sections, and for the recommendation letter, my work is trying to parse whether your recommender:

a) is a British or German academic and therefore scale their relative evaluation of you based on the emotional range they consider appropriate to communicate in a professional setting (this is not entirely a joke),

b) doesn’t quite know how to communicate why you are a good candidate but is nevertheless excited to have worked with you (this is rare for professional academics often writing recommendation letters, but possible for junior people),

c) is subtly trying to signal their reluctance, or is just flat out unexcited / unimpressed, or

d) don’t actually know who you are (extremely obvious).

Here’s the one concrete advice I can give on this topic: pick very wisely who you ask to write these letters for you. You want somebody to be able to say something of substance about you, that they have concrete experiences working with you in a research setting, and that they are confident this will carry forward into your next stage of life. Particularly convincing are the ones that say “I wish the candidate had stayed in my own lab for their PhD…”. Again, if they spell out in their letter all the qualities that I am looking for in section 0, and justify statements about some or all of those qualities with concrete examples of your interactions, then that’s about as good as it gets for me. One potential issue here is that some letter writers don’t necessarily know how to write convincing letters that follow the claim-evidence structure, but then you’d at least hope they will convey some positive emotions about you, and reiterate claims you would make about yourself. If in doubt, send them this blog post (no, don’t really, that would be a bit…patronizing).

I think the above is pretty obvious, the question is, what do you do when no such person comes to mind for you? In my opinion, what’s worse than an unskilled writer or perhaps a lowly postdoc or PhD student writing your letter is the full professor who really has no idea who the hell you are. Obviously, if you manage to get a convincing letter from Terry Sejnowski or Eve Marder or whoever, saying you were a superb student in their seminar / research project course, then that’s optimal (and frankly you can stop reading at this point). Personally, I think it’s much more valuable to have a full and positive portrait of a candidate from an unknown person than a lukewarm description saying “this student placed in the top 5% grade-wise in my course, and was generally prepared, helpful, and on time” signed by a well-known person because I can obviously read off that information for myself in your transcript, so it doesn’t add much. Hell, if you really have no one in a research setting to write this letter for you, I would rather see a letter from a long-term coach or work supervisor that can explicit spell out those PhD-ready qualities. But again, this might just be me.

On the flip side, you do have some influence on what their portrayal of you looks like, by reminding or informing the person of who you are: you might provide a lot more supporting information to help them help you, or maybe even ask if this more well-known person can write you a good letter. I’ve heard that some people explicitly tell candidates to ask someone else when they won’t be able to say anything substantive, which sounds like a slap in the face, but it’s really doing both people a favor. But it’s a tricky situation to navigate as a student because you might think it’s offensive to ask and unask somebody. So push comes to shove, I’d say just ask someone who you have a concrete personal relationship with, and let them know early enough so they can prepare. Even more helpful is to outline the arguments for them, i.e., “I am resourceful because when our team bus broke down, I managed to get us to the game on time because X/Y/Z…” Though you want to be careful here that they don’t copy something exactly as you’ve worded it in your application, because then it could look like you wrote your own letter for them to sign off. This happens often enough and to be honest, I don’t consider this a “red flag”, per se, but it’s just another sign that they don’t know (or care about) you as much as they should, and so it doesn’t really serve as a piece of positive evidence.

A more general rant about recommendations and letters: first of all, the percentage ratings in the form of “this candidate is among top 5%/10%/20% of students I’ve worked with” is often quite unhelpful, unless the prompt very specifically states the pool they should be comparing to, or even what that denominator actually is. An average professor in the U.S. teaches 3-4 classes a year, of maybe about 50 students, which is 200 students every year. On the flip side, they might see 5 new research assistants in their lab per year, tops. So “top 5% of students I’ve interacted with” is very different depending on what their interpretation of “interaction” is. I don’t really take this into account anymore when there is a letter attached, and find it hilarious when someone writes a very strong positive recommendation but put down “top 20%” or something.

But the bigger point is this: as long as we’re still using recommendation letters as a part of the evaluation process, let’s not kid ourselves in how “objective” or fair we can be in academia—it’s inherently based on word-of-mouth recommendations, like your local pizza joint. I’m not saying they’re not useful, like I just outlined above, they are extremely useful for an evaluator to make a judgement—but under the current system and at a lack of a better way for the candidate to demonstrate their qualities themselves. This means the person writing the recommendation has a huge say in how they want to portray the candidate, and again, also very much depends on how well-known they are and how skilled they are at writing these letters. I don’t know if this is something you get taught in professor school, but it’d be quite unfortunate for the poor student who asks a professor that just doesn’t really use nice words in a letter. Conversely, the ones that figure out that modern academia—like every other human enterprise—is first and foremost a social construct, and therefore can expedite their own success by placing their mentees into positions to succeed, such as a very competitive graduate program, are the ones that accrue more resources and more opportunities to further the propagation of their ideas in the long term—good or bad. It might be cynical, but it’s not untrue.

2. Personal statement and CV

Next, the personal statement and your CV/resume offer very different information, but they are similar in that these are the components over which you have the most control when preparing the application—not necessarily the objective content, but in how you present that information. Basically, it should convey who you are, why you are ready to do research, and why here, in as few words as possible.

First of all, if you have multiple publications or conference papers at well-known venues, then I don’t necessarily care about who you say you are (unless, again, there are red flags about your ability to coexist with other people). I still look at the recommendation letter first, though, because the tangible achievements are almost always mentioned in the letter anyway (assuming you asked a supervisor / co-author to write), with the additional upside that they might positively comment on you as a person in ways that’s not apparent through the publications. The real question is: what should the majority of the applicants—those without demonstrable proof of previous research success—do? Pretty simple: make my life as the evaluator easier. Specifically, that means think about the arguments you want to make (re: the qualities), structure your statement and CV to deliver this claim and the evidence as quickly and concisely as possible, and don’t have any major fuck-ups.

Let’s address that last point first because it’s the easiest. A major fuck-up means, for example, uploading a statement addressing a different school or program. Honest mistakes happen, and there is no situation where somebody intentionally uploads a wrong letter, obviously. But what this conveys is a lack of interest, even though we all understand that any given candidate could be applying to 5-20 different schools, at the very least, especially in the North American system come November. This could also reflect a lack of care and organization, but in the end, I’m not sure if this is an objective red flag as much as it is an offence to the evaluator’s ego, like “damn you couldn’t be bothered to at least check it again? I guess we aren’t so special here to you,” which is hilarious because I couldn’t give less of a shit personally whether a candidate ends up in Tübingen or Böblingen or some other small European city with a good graduate program.

But a related and much more realistic scenario is that the statement is so completely generic that it could have been used for any program in the world, which would fail to convey why this place is the “right fit” for you. This means a lack of awareness about the research areas of the PhD supervisors, no actual mention of supervisors you are interested in working with, or just flat-out saying incorrect things. Again, this doesn’t necessarily mean you would be a bad PhD candidate, but it doesn’t really make me excited about having you in the community here. The ego joke aside, a tailored statement could convey a strong fit and possibility for collaborations with people around you, if not in a concrete project then at least being intellectually enriching for all parties involved. Sure, some applications are throw-aways or for “safety schools”, but if every school you apply to thinks that they’re your safety school, then you’re probably gonna have a bad time, nevermind wasting a bunch of money on the applications. Practically, the most reasonable thing is to have a free paragraph or two at the end of your statement that can be exchanged in a program-dependent manner. At least go through the effort of Googling some names. Again, this is not just to stroke someone’s ego, it’s for everyone’s benefit that you arrive into a graduate program with some indication that it would be a good fit, and on the off chance it doesn’t work out with your first supervisor, at least there’s some chance that you find another lab working on topics that you’re interested in. If your statement can convey this adequately, it’s a really big plus for me. This seems pretty obvious once you know it, but lots of people, especially those that don’t have a “mentor” in academia, don’t know. I certainly didn’t do this when I applied, that’s how I ended up getting a PhD in Cognitive Science, so it’s not the end of the world, evidently.

Extending that last point but transitioning to the topic of “who you are”: a clear and accurate description of the advisors you are interested in not only demonstrates your research interests, but your academic maturity. This is honestly pretty rare to see in PhD application statements just because people won’t have had that much experience to delve deeply into a topic, but when I see some version of this, it’s a huge bonus. For example, one application had something like “I know that lots of people are working on different variant of [topic A], but I’m specifically interested in topic A, sub-area X, because of my experience in …” At the end of the day, it all goes back to that list of the points I laid out in section 0, and such a statement provides evidence for many of those points, and I should stop repeating myself at this point.

What IS worth emphasizing here is that you should aim to convey these points as quickly as possible. An “average” application can really stand out by presenting all the relevant information for your evaluators in the most accessible and direct way, whereas a theoretically “good” application can obscure key points in lots of text. If you claim you have a quality, then immediately provide evidence to support this, and say why that matters. It’s basically what they teach in high school English class for how to write an essay, but nobody pays attention to that shit, at least I didn’t. So here it is again. This is true for most formal academic writing, like a paper, but certainly true for a personal statement: I do not want to be guessing who you are and what you’re interested in. Just tell me, then convince me!

I guess it’s also worth mentioning that the statement is not a laundry list of stuff you’ve done, it’s an argument that’s supported by the stuff you’ve done. The argument or claim is: I will be competent / I am interested in doing X, so let me in your school to do it. The evidence is, most likely, “I’ve done X before” or “I have always been interested in X as shown by…”. It’s very difficult to quickly distill what exact point you want to make when the claim is not explicitly stated. I get that people often feel shy or embarrassed in claiming something about themselves, like “I am hardworking”. But remember that the entire point of the personal statement is to convey exactly those points, and you make both of our lives easier by stating it upfront than trying to be modest and let me guess (though obviously try to be tasteful and measured in what you claim). Same thing for the CV: if you’re applying to a lab that does research on or with database stuff, and you’ve worked with databases in a previous job, state it. Personally, I saw a lot of interesting CVs with a diversity of previous job experiences, and to be honest, having performed well at a job is as much evidence of being ready to work in research as anything else. Important and relevant stuff at the top. I don’t know, is this all obvious?

One last thing I already mentioned previously regarding the letters: if you have a long-standing hobby or a community service that you do, especially one that you dedicate a lot of time to and perhaps quite competitive in, talk about it. It may be difficult to properly contextualize it in writing, for a PhD application, because the worry is always that “nobody cares that I’ve been knitting for the last 10 years”. But sustaining a hobby for the in the long term is difficult, and it demonstrates that a person is able to stick to something that they’re interested in for a long time, even through (presumably) difficulties. Assuming you are just as passionate in whatever field of science you chose, this bodes well. Of course, I wouldn’t rely on just this fact to get into grad school, but anything positive helps.

3. Grades and standardized test scores

Not much to say here: good grades (and test scores) are better than bad grades, and that’s not something one can change during the application process. What was surprising to me personally was to realize that, as much as I was a believer in the fact that grades are uncorrelated with success (or survival) in grad school, I still viewed a good transcript (and good GRE scores) very positively. In some cases, it rounds out a good application that had great letters and CV; in other cases, it saved an otherwise unremarkable application from being straight up tossed out. In the end, it’s another metric, and while I think the skills to get good grades in university are mostly orthogonal to doing well in research, some things do overlap, namely: being organized, being able to learn, and being consistent (though one can get good grades without those qualities).

Put it another way: there are multiple ways for a person to achieve a near-perfect transcript. You could be “naturally smart” in the sense that none of the stuff ever challenged you, or you could be a “book smart” person that knows how to study and perform well on tests, or you might actually enjoy the field of your study so much that it was fun to delve into things. These 3 different people will face different challenges when they first start a PhD, and it’s unknown whether they will be able to overcome them. But it’s certainly the case that getting good grades does not directly lead to an easy time in grad school. At the same time, having bad grades doesn’t imply that one cannot do research, and there may be a million reasons why somebody has mediocre grades in university, ranging anywhere from personal issues, working a job, not having found something that’s interesting, or is “dumb” (whatever you want to define that to be). In the end, the transcript is the final and observable outcome that was the result of all those factors, and it’s impossible to guess, from that alone, what kind of person the candidate was and what challenges they faced during their university education. But good grades don’t hurt, and if you do for some reason have some bad grades in a transcript, you should maybe explain why that happened, and most importantly, why that should not imply you are incapable of doing research (and in parallel, point to things that does say you can be a good PhD student).

Finally, and onto the most controversial thing: GRE and standardized tests. It’s difficult to gauge whether a A+ transcript from one school (or country) means the same from another school. Viewed charitably, schools vary in the difficulty of the material they present on a particular topic, and this is true even from professor to professor, from department to department. This is just a fact, and leads to things like taking a specific course in one semester vs. another because a certain professor is teaching it. Viewed uncharitably, some schools inflate grades more than others. Regardless of the reason, the very real difficulty in evaluating a candidate, especially those from different countries, is to gauge how much stock to take in their perfect transcripts. One strategy is to say, let’s just not look at the transcript in either case, since it’s quite subjective. Another strategy is to try to come up with some standardized measure that gets rid of these systematic variations. Yet another is to have someone familiar with the respective systems to do these evaluations, e.g., someone that knows what a 4.0 at Princeton vs. Stanford REALLY means, and which of the universities in Iran are more difficult than others.

The first strategy sounds more fair in theory when viewed within a limited context, but in practice just puts more weight on other, and potentially even more subjective, measures, like the recommendation letters. The third is probably more fair, but it’s simply impossible for any one evaluator to be familiar with at most 3 or 4 schools or systems, compared to as many schools as there are candidates in any batch of applications, and you certainly don’t want to down-weigh a school just because you’ve never heard of it. The second strategy, well, that’s the GRE. Prior to this experience, I would say that I was a mild opponent of the GRE, I just didn’t see it as being really that useful in evaluating the candidate, for how much it costs. But having been on this side, I realized that those numbers are just another set of numbers that provide another view on the candidate—it’s another column in you data matrix. More specifically, it’s yet another chance for a set of candidates to really showcase something positive about themselves. Scoring perfect on the GRE by no means guarantee research success, but it tells me that this person can do tests really quickly in a short time, and probably had to cram quite a bit the month leading up to it (which IS a skill useful in many walks of life). At the same time, understand that these standardized tests are not standardized, or even unbiased, for many reasons. And as such, they should be considered with ample precaution and context such that people are not systematically penalized because of that.

Nevertheless, it’s yet another piece of information, and personally, it gives me another opportunity to view a candidate positively if they had nothing else. For many people, doing unpaid research in their spare time during university is simply impossible, which pretty much eliminates their potential for getting a good letter or publications on their CV. I’m speaking as a person that would have never done research in the summer had I not been fortunate enough to get funded positions through the Canadian government (shoutout to NSERC and the motherland). A Master’s degree (and the associated thesis) mitigates that somewhat, but it still costs money to do a Master’s degree for two years. In the end, I don’t think I have a problem with standardized tests, I have a problem with mandatory standardized tests with an exorbitant price tag. If you have a problem with how much the GRE costs, don’t get rid of the GRE, get rid of your graduate program’s application fee, which literally provides zero information on the candidate other than whether they’re willing to pay 100 bucks. Most people apply to at least 3 schools, that’s your free GRE right there. If you have a problem with the GRE being systematically biased, then you should motion to get rid of recommendation letters too. But at the very, very least, standardized tests are something that a candidate can control, on the timescale of a month or two, and could be someone’s last real opportunity to prove themselves in their application. But then again, I don’t know how many such slumdog millionaire scenarios there actually are, of someone who is really saved by the GRE and didn’t otherwise have good grades, letters, or CV.

So what should you, the candidate, take away from this rant? Not much. Try to get good grades and test scores. Though I guess my one recommendation is that—and this even surprises myself to say—if it is optional for a school, and if it has the chance of being the most outstanding thing in your application, then I’d consider making the investment to do it. But I really wouldn’t worry too much about it otherwise (obviously study and try to do well if it is mandatory).

Final thoughts

During the couple of weeks that I was working on this post, I saw this tweet, which basically amounts to saying “we have no idea how to pick ‘good’ PhD students and our arbitrary criteria are reinforced by confirmation bias”. For the most part, I (emphatically) agree. I think a much better question than “who will do well in graduate school” is “who will work well with me”, as a PhD advisor. The latter question determines the success of the student much, much more, and is a function of both the student and the advisor’s styles, than some homogenous average quality. I have written down a list of qualities up top that I think helps a person do well, but that’s very much limited to my experience and personality, and because I think “being a genius” is not something worth putting as a bullet point.

So why the hell did I write this thing? Let me re-iterate: this post isn’t about being a good PhD student or being competent at research, it’s about how to optimize your application so that you have the chance to do what you want to do in science, by first passing the hurdle of differentiating yourself enough in a batch of 300 applicants. I do believe in those qualities that I listed, but given the chance to talk to someone in person, I would not use any component of the application package to judge that, and I think any advisor worth their salt would feel the same. So for you, the candidate, the challenge is to get through this mass prescreening round to have the opportunity to speak with your future advisor, which is why I have no shame in saying: all else being equal, the best thing to do is to make your evaluator’s life—my life—easier.

Neuronal Timescales - the Director’s Cut: Third Research Paper Published (Part 3/3)

2021-01-23T00:00:00+00:00

This is Part 3 (of 3) of the blog series on neuronal timescales, you can find here Part 1 and Part 2. There’s finally some useful stuff in this one, I think, as well as some strong opinions on scientific publishing (constructive and…ranty).

10. tenured professor runs own analysis…(October 2019)

…and graduate student proceeds to almost break it.

Part 2 got to the end of Christmas 2019. For this final story, I have to rewind a little bit, back to late October of 2019. This is right around SfN2019 in Chicago, and I gave a poster presentation on all the new results so far, including the anatomy and gene expression correlates. It went really well, and it was the last SfN as graduate students for the rest of the original cohort in the lab (but the first time shotgunning a beer for some). Right around then, we were starting to think about writing this work up and publishing it, reasonably confident that it would be quick (it was not) and that I’d be graduating soon (I did not).

For a paper like this to be meaningful, at the end of the day, you need to show some kind of behavioral relevance, otherwise “it’s just a methods paper”. This is especially true if most of the results, while cool, was recapitulating existing knowledge from non-human work. So we got to brainstorming. I say “we”, but in this particular case, it was actually just Brad, and I didn’t have much to do with prototyping this idea until we were ready to QA/refine the code and make the figures (though I almost broke the analyses a few times). I have two very vivid memories that summarize how this happened, and both had the flavor of “I thought tenured professors didn’t run their own analyses anymore.”

Moment 1: I’m sitting in lab minding my own business, as one does pre-pandemic, probably making my SfN poster. Brad comes in and goes “yoooo check this out”. We had talked about trying to show some behavioral relevance already, but apparently he couldn’t bear waiting for me to start digging around, so he went and did it himself. Obviously we weren’t going to run our own experiment at this point of the project (and given the historical precedent, or the lackthereof), so he found a human ECoG dataset collected during a working memory task that a friend from Bob Knight’s lab had deposited on CRCNS (the other patron saint of my PhD). This is vivid in my mind because we were sitting at the round kitchen table in the lab, and Brad described the experiment to me, and asked whether I think timescale would increase or decrease during working memory maintenance…

The structure of the task was standard as far as working memory experiments go: there’s a pre-stimulus baseline period, then a set of visual stimuli comes on the screen, then a blank delay period, then a cue for context-dependent recall response. In this case, there’s a pretty straightforward hypothesis: if working memory is the active maintenance of information, then it would be reasonable to expect the neural activity in those involved regions to have “more history”, or longer timescale. It’s a rather naive guess, but I went with it and said “increase”, and he had this dramatic slow-roll with a suppressed smugness on his face: it actually turned out to be true! We scrolled through dataframes in a Jupyter notebook and he called up the pre vs. post t-test statistics, and yup, there it was. During the delay period (holding onto information), timescale in all the recorded regions saw an increase relative to baseline (obviously the nice figure here was my doing, tyvm).

I was swelling with pride—my associate professor of an advisor is ready to become a PhD student again. But seriously, it was really weird and really fun to have that role reversal.

Moment 2: before the pandemic, Eric and I got into a nice habit of going to this bar in San Diego for house music on Wednesday nights (RIP Blonde). On one such Wednesday, we were at said bar and I’m shuffling my feet, and I get a Slack message from Brad:

Note the timestamp (it was actually 11PM PST, so not quite as bad as it looks…for both of us). This was apparently 2 days after Moment 1 above (and 2 days before flying out to Chicago), where I said I was going to check over his code, and I obviously hadn’t looked at it in the ensuing 48 hours. I don’t want to make it seem here like Brad’s always on my ass after work hours, because he never is. I’m usually the one sending random messages in the dead of night because that’s when I work sometimes. But here we are on a Freaky Friday Wednesday, and I remember this SO vividly because I’m now standing in the middle of this tiny crowded dance floor staring at my phone aka highschool-me. Alright, my interest is piqued, shoot:

So what he had done was, beyond looking at within-individual timescale changes from pre- to post-delay period, he looked for across-individual trends. Specifically, was there a relationship between a person’s modulation of timescale and their average performance on this working memory task? This would be kind of crazy cool even without making any causal claims, because it’s a direct correlate of behavioral outcome (though I have lots of rants about this anyway). And yeah, consider my mind blown, because it turned out, across the 15 or so people in the dataset, the better their average performance was (x-axis), the more (longer) their timescale increased from pre- to post-delay (y-axis), and this effect was only significant for PFC.

Going all the way back to the radioactive decay analogy: the more volatile (shorter timescale) the element was, the faster all of it disappears, and the less useful it would be for reconstructing events farther back in history. Except, in the case of brain activity, we are apparently able to modulate the timescale of our neural activity based on behavioral demands, and the longer the modulation, the better performance one got (or vice versa, not sure), at least in this very simple task. There are lots of caveats and open questions here, especially in the causal direction of the modulation (is it brain modulating behavior, or behavior modulating brain, and where do neuromodulators come in?)

These were precious moments, from which came the core of the final analysis in the paper (Figure 4A-D). Of course, the “checking it twice” part of making a list again took way longer than making the list itself. In this particular case, I inherited the notebooks, refactored, broke the analysis, fixed it, broke it some more, fixed it some more, ran it through the gauntlet of tests until we were both reasonably confident that the result was not a fluke. These next two snippets of messages perfectly capture this sentiment, and as a whole, the project (and really, science).

11. the wrap-up (early 2020)

…yeah, so it did not take two weeks from then to finish the paper. But this wasn’t due to anything extraordinarily frustrating or stupid, it was just plain academic optimism and the inability to estimate time into the future. I had presented this complete set of results (including the behavioral stuff above) a few times, including to my PhD committee in my pre-defense meeting in January 2020. We collected all the feedback, prioritized the most damning criticisms, and did some extra work to address them. Protip: obviously, comments and feedback from trusted advisors almost always improve the project. But there’s an additional art to getting your paper trashed preemptively, so that you can address those shortcomings before the first round of reviewers have an opportunity to trash it harder. This saves time and emotional turmoil for everybody.

In this case, there were concerns over the validity of the correlations between the cortical timescales and the T1T2 and gene expression maps. I did a couple of things to mitigate that, including the gene ontology analysis, as well as redoing everything with the addition of regressing out the T1T2 map from both the timescale and the gene maps, then correlating their residuals as a way of asking “which genes relate to timescales above and beyond what’s expected from this primary anatomical hierarchy?” Just like the previous sentence, none of this is terribly…sexy, let’s say, in the context of the main results, i.e., finding the correlations in the first place. But I think I’ve realized in the course of my PhD that, almost always, the time consuming part is not the first result itself, but the checking and rechecking everywhichway afterwards.

Or maybe it’s not, maybe the endless thirst in finding out the answer for that very first time makes time itself feel like it’s running faster, and everything afterwards just feels like a big fucking drag. Alas, that’s science today: gone are the days where you can cook something up and send in a 2-page Nature Letter, literally like a correspondence to your bud. But I suppose that’s a good thing, and makes for better science. Plus, we have computers, Google, and open source data and software packages, like scipy, pandas, NAPP, and packages like brainsmash and brainspace for more sophisticated spatial correlation analysis, which were a timely god-send. Can you imagine trying to code this thing up by myself for this project? It’d be a primo hackjob. All these things make my life infinitely easier as a scientist, and none of this would be possible without them. This really made me appreciate how much foresight the people that started these various large-scale efforts had (like HCP, the Allen Institute, and CRCNS), as well as the open source software development communities. I don’t know how well I’ll keep to my words here (because laziness), but reflecting on this right now makes me genuinely want to not write shit code-not from a perspective of “that’s obviously the right thing to do”, but from a deep place of appreciation inside my heart, because maybe one day someone else would find a few lines I wrote useful.

12. and then time stood still + a rant and protip (March 1-March 124310798, 2020)

Judging by how things were going, 2020 was going to be great! All the main results were consolidated, and even though it took longer than we had thought, the paper was going to be finished soon. I had planned a short 3-months visit to Jakob’s lab (then in Munich) for April - June, so I worked quite a bit in that first quarter of 2020 to wrap up this paper before I left San Diego. Obviously, neither of those things materialized, and we all know what came next…

I suppose I was able to take advantage of the stay-at-home order, especially when California was first taking the pandemic seriously. I even got to see (and surf!!!) the fabled Blue Tide because I wasn’t able to leave San Diego. But it’s been 329487 days since pick-up basketball, and there was even a month of surf ban, so what was I to do except hole up in my apartment and write? Two months went by with the snap of a finger, I lost track of day and night—quite literally, actually, because on top of intermittent fasting, I tried biphasic sleeping for a few weeks (lol). I also tried making cold brew with wine. It was…disappointing. Look, this is the kind of weird shit a graduate student who lives by himself and can’t go outside ends up doing, at least I didn’t start drinking soylent again…yeah I mean at this point in the story, there’s nothing exciting really happening anymore, so I guess I will write something that may be of use to you, my loyal readers:

Even though the paper took a solid extra few months than anticipated to finish, it could’ve taken much longer. The reason that it didn’t is because I didn’t actually have to write much—most of the results and methods were already written. My current workflow, roughly, is to run heavy-duty/slow analyses in python scripts, and save out the intermediate results to disk (in whatever format). All the final-stage computations are then performed in Jupyter notebooks, including collecting the intermediate results into pandas dataframes, doing statistics, and making figures. In doing this, I’ve gotten into a habit of writing long-form paragraphs describing the code and the subsequent results in the markdown cells in the notebooks themselves. Some of these even come with a “background” section, with some relevant literature referenced within, so that each notebook itself is almost a mini-paper on that set of results (see the working memory results, for example).

I definitely didn’t come up with this, nor do I do it the best. Many, many people before me have talked about the notebook as the final scientific product, and you can even include fancy things like widgets to play with analysis parameters. I also didn’t do this thinking that this would eventually be useful for writing the final paper, not at all. I did this because I fucking hate journal articles in its current form. I really, really do. We’re in the 20th century now, there’s electronic media and embedded hyperlinks, and even embedded executable code. There’s absolutely no reason to still hold onto pretend-paper as the de facto medium of scientific communication. This has been literally the same medium since the 1700s, except now we pay even more money for less physical material. Where’s the lie fam?

I get it, some people prefer paper, and even go as far as printing articles out. I am one of those people when I have a printer, and I love physical books. But the pdf is the least worst of all current solutions, not the optimal, and we’re talking about what’s potentially an entirely different conceptualization of the scientific process, at least for computation-heavy fields. If all of us use code, data, and marinade those two things together to create results anyway, why go the extra step of producing a 10-page text document elsewhere? Why not just write directly where the code is created? Yeah, maybe nobody wants to read a “paper” in C++ comments, but now we have things like markdown and Jupyter, where you can quite literally explain things in the thick of the action and look pretty doing it. So why not do that? If I had it my way, this set of notebooks would be what I submit to “journals”, or whatever peer review mechanism independent of the accreditation party is. That’s why I wrote the whole thing in the notebooks, and thats why I truly love eLife as a journal (more on that later).

Anywayy, thank you for coming to my TEDTalk, hopefully you’ll consider doing this too now, and one day, we can all start passing runable research notebooks as the scientific output, instead of saving pdf links that will never be opened again. But if you’re not convinced, here’s a very tangible benefit: instead of undertaking this massive paper-writing task in one go after all the results are finalized, writing up the methods and results in real-time as you finalize the stats and figures automatically chunks the work into intuitive pieces. Affordances! This is a protip even if you don’t write them in markdown in the notebooks, for whatever reason (you’re wrong). Just write them down somewhere—Google Doc, Evernote, whatever—and be as verbose as you can as you produce each result. Write as you go. In the end, just copy and paste the stuff you’d already written, wholesale, and do some light editting, and that’s a paper!

That’s how I spent my first two months in quarantine. The preprint was up on bioRxiv the last week of May, we sent it off to Nature, George Floyd was murdered on camera, and I packed my car and went into the desert for a week by myself.

13. I love eLife

Yes, after all the hemming and hawing about dropping outdated traditions, we submitted to Nature—sue me for conforming to the current bar for getting an academic job. Mans gotta eat. In any case, I won’t speak to how the rest of the guys felt and just share my own opinion: it’s not so much that I thought the paper was “good”, whatever that means, to pass the bar at Nature, because I know we’ve all gotten salty about somebody else’s work (“undeservingly”) published in Nature. It’s more so that I thought it ticked all the boxes for flashiness, especially with the wide range of data and methods we used, and for human neuroscience! To be more self-deprecating, it was the right kind of marketable garbage with broad appeal, or so I thought. It was eventually a editor-reject after two weeks, because it “lacked impact”. But it is what it is, can’t be mad, and I was in the desert for the first of those two weeks so I got my kumbaya under me.

I’ve already had my one rant per post, so I want to talk about something positive instead. Somebody told me a joke once that PNAS stood for “Post Nature And Science”. People are not going to stop sending to Nature and Science, for obvious reasons. But I think we should strongly consider eLife as the new “PNAS”. I’ve now served as both reviewer and author at eLife, and I’ve had overwhelmingly positive experiences both times. To summarize succinctly, I feel like I’m treated like a human being, living in the 2020s, and interacting with other human beings at their office, instead of typing into an electronic void that serves solely as the interface between authors and reviewers. No offense to editors/administrators at other journals, if you also make it a point to be human—you have my thanks.

First, eLife takes submissions directly from preprint servers (like bioRxiv), and they recently announced that they will start to only review preprints. Granted, there was still some extra stuff to be done and forms to be filled out in the process of formally submitting, but it’s definitely a move in the right direction. On top of that, if you so choose (as an author), the “raw reviews” you receive are posted back on biorxiv itself, so people can see each reviewer’s responses to the original version of the article. This is valuable in many ways, the most important of which is the transparency provided into the review process. So much of the “scientific progress” actually comes in the form of ideating and negotiating during the reviews. As much as I hate to admit it, more often than not, the set of comments from 3-5 expert reviewers make the original work better, and even contain genuinely novel insights sometimes, so it’s a shame for those comments to never reach the light of day. At the very least, posting it publicly also clearly demonstrates if and when a single reviewer is being unreasonably obtuse.

Perhaps my favorite thing about reviews at eLife is that, first, after all the reviewers have submitted their comments, there is a round of “consultation” for the reviewers and editor to hash out any disagreements outright, before it reaches the authors. The editor reads and digests all the reviewers comments, and then organizes them into a single structured document, as opposed to 4 people’s separate notes. This task can range from distilling and re-interpretting all the reviewers’ points into a summary, to just moving around the comments into shared thematic categories, and anything in between. If this doesn’t seem like a big deal to you, consider yourself lucky, because dealing with 4 disparate sets of (repetitive and/or conflicting) comments is a gigantic pain in the ass. As a reviewer, I always check the other reviewers’ comments because often it’s relieving to see that all of them share similar sentiments on a particular point, and that I didn’t say anything incredibly stupid in hindsight. This now happens in the consultation. Also, every author and every editor has to do this internal collating anyway (at least I hope so), so why not just save the authors some time and effort, and send out those notes as a distilled guideline?

This whole process acts as a quality assurance step for the reviews, and also explicitly sets the revision priority for the authors, which is extremely valuable for all parties involved. Finally, when the reviews reach the author (me), all I have to do is to go through the document one bullet point after another, more or less confident that the comments are internally consistent. Finally, if the paper is fortunate enough to be accepted after revision (encouraged to be limited to a single round), the reviews and responses are published as a part of the article itself. Again, maximum transparency, and this begins to truly feel like a productive scientific discussion, instead of an adversarial combat of who can write the most words so the other parties become disinterested enough to stop arguing. Please, journals (and editors), take some responsibility as the middle person in this process, let me know that you are alive and have important opinions, and not just dump the task of placating the whims of the reviewers on the authors. If you are an eLife editor reading this, you can always count on me to review for you, 100% (but timeliness not guaranteed).

Lastly, and relating to my earlier point on being not in the stone age of academia anymore, eLife has launched in 2020 an effort called Executable Research Article (ERA), which is basically live runnable code embedded in the online version of the article. “Whaaat? No way!”. YES. F-ING. WAY. ALL THE WAY. In this format, you can click open a figure, and change the code on the spot to see how it affects the resulting visualization. I was so quick to volunteer for this thing after our paper was formally accepted that the ERA team told me I actually had to wait till the nice typeset version is live on their website first, which took about a month. So I actually just started this process yesterday, which, as far as I understand, involves me copy and pasting over code from my Jupyter notebooks and link to the correct spots in the paper, in between the relevant texts—basically, a literal fucking reproduction of what my notebooks already are. By the amount of profanity littered in this paragraph, you can tell I’m emotional—I’m not angry, I’m ecstatic. This is my dream come true. Obviously I’d be happier if we could’ve just started from my notebooks in the first place, 6 months ago, but that’s not on eLife. Funny enough, as I was reading through the ERA guide yesterday, there was this gem of a blurb right at the beginning:

This is now my scientific side-quest: to see the day when eLife goes from PostNatureAndScience to BeforeNatureAndScience.

14. final thoughts on reproducible luck

And that’s a wrap! The reviews did take some time to get back, but in the end, these were the most positive and constructive set of comments I’ve ever gotten in my life for a journal article, and it was accepted shortly before I defended my PhD, so I’m really counting my blessings. If you followed this story all the way through, you should see —as I had alluded to many times—that this project was the result of many fortuitous encounters. Not only was I lucky to meet the right people at the right time, I was lucky to be in those situations in the first place, which means having the privilege to be advised by a “Second rate” neuroscientist (his words) at a first rate institution, and thus having the resources and good name to take advantage of those opportunities. I recognize all of that, that’s what luck is: being able to capitalize on fortuitous opportunities.

Unfortunately, I can’t teach luck, and I can’t (yet) provide any real opportunities. I also don’t necessarily recommend stumbling around haphazardly like I did here, it’s hard to reproduce. So what can I recommend?

Well, basically all that one could do to be prepared when presented with such opportunities. One of the concrete things I’d recommended above already: if you already use Jupyter notebooks, treat it like the entirety of your project—code, data, and paper. Of course, that only applies to semi-mature results and ideas. But with that in hand, I found that it was much more efficient to send people early results in the form of a GitHub link, instead of rewriting long emails or reports, especially to people you’ve only just met at a conference or in passing.

Another thing, and this is not necessarily a recommendation, more so a note on what worked for me: read lots but at the right time. What does this mean? I think this really only applies to computational/theoretical work, but before I have a good concrete idea, I read relatively broadly, meaning all kinds of random shit from whatever subfield of neuroscience, and whatever I see on Twitter. This breadth gives me inspiration for ideas and connections between unrelated things. Once I think I have a good enough idea, I stop reading. This is partly because I’m absorbed in actually screwing around with code so I don’t have time to read, but it has a very tangible benefit: usually, the more I read, the more I convince myself that something’s already been done, and the more pointless my idea seems (and never the opposite). I don’t think this is untrue, because nothing is really new under the sun. But nothing is exactly the same either, and often it’s only because you have a unique perspective into the connections already that it feels like old news, but you may be the first and only one that sees that.

Sometimes the only reason for pursuing vs. abandoning the idea is actually thanks to the sunk cost fallacy: the more I’ve already done, the more I’m invested in making it an actual thing, even if it’s not particularly groundbreaking. But if I’m convinced that the idea is nothing new before I have a single result, I wouldn’t even bother to start. Again, it’s almost never identical, and there is so much we don’t know. Even if two people try to implement the exact same idea, they might end up having their own unique points. Then, when the time comes to place those results in the context of the existing literature, I obviously try to read as much of the relevant literature as possible, or, at the least, I try to collect them, to make sure I’m truly not reinventing an identical wheel. At that point, it’s a matter of crafting the research niche after the fact, which often looks different from the real story (i.e., this one). This is why I say to read at the right time. Not sure how this applies to experimental work where you need to be relatively certain of all the existing work before investing in running an experiment, and obviously this already assumes some level of technical competency while only lacking in inspiration, so better suited for year 3+ PhDs.

Last point, and this is more of an advice for myself than anyone, because I think I’m not great at this still: talk to people. There is almost zero cost (or risk) in talking to someone about my research, and only benefits to be reaped in case of potential collaboration, or at the very least, a fresh set of ears and eyes for ideas. I know all this, and I (clearly) have no issues in writing about my stuff, but I just can’t bring myself up to talk about my research, especially at a conference. I feel like I still don’t have a good elevator pitch, and also I get tired of the “what do you work on? who do you work with?” exchange after the 10th time in the same day. I’d much rather talk about something completely non-sequitor, like dinosaurs, and see where that leads us as human beings. Whatever the method may be, the point is, at some point, social capital (i.e., people) becomes the most effective way of amplifying your value for the world, though probably only if you enjoy their company in the first place.

Neuronal Timescales - the Director’s Cut: Third Research Paper Published (Part 2/3)

2021-01-17T00:00:00+00:00

This is Part 2 (of 3) of the blog series on neuronal timescales, you can find here Part 1 and Part 3. I think this one covers most of the actual scientific “discoveries”, if you will. The serendipity/luck that went into making this happen is beyond my comphrehension, but I’d probably be still doing my PhD right now if these stories never happened.

6. discovery! progression of timescales in human cortex (early 2019)

2019: new year old me, but I became a proud contributor of the Toronto Vomit Comet following the New Years Eve party, if that means anything to you. Damn, I gotta do stuff to graduate though? After we were both convinced that I didn’t screw something up in the analyses, Brad, of course, was like “well you gotta check it in humans…have people shown this for the whole humans cortex???” I don’t know, because I haven’t read any papers yet, but I don’t think so. Worth a shot I guess? I’m also going to defer responsibility here because none of this was planned research…

Go on Google, type in “big human ECoG dataset” (or something of that nature, I shit you not), and stumble onto this MNI open iEEG database—with task-free recordings from 1700 electrodes in 100+ epilepsy patients. I have never seen this before in my entire life of parasiting other people’s ECoG data, must be a scam. I download the data, holy shit it’s manually cleaned and organized by brain region, all the data nicely living in a big matrix of time-by-channel, with metadata for electrode location in MNI coordinates, PLUS patient age and sex. Seriously, how lucky can you get? Running this through a modified version of the macaque ECoG pipeline probably took a week or two tops. Plot timescale estimates by brain region: boom, neuronal timescale follows sensory processing hierarchy in humans as well—fast timescales in primary sensory and motor regions, slow timescales in multi-modal and association regions (frontal, hippocampus, etc.).

This was essentially the main finding of the paper, and it really was this easy and fast. On top of that, because the dataset came with patient age information, it was natural to check if there was an across-subject relationship between timescale and age. To be completely honest, this could’ve gone either way at the time and it would’ve made sense: I don’t know if I would’ve hypothesized timescale to increase or decrease with age, though I think that was mostly due to my own ignorance of the aging literature. In any case, a quick check showed that, indeed, timescale tends to decrease with age, and in almost all areas of the brain. Later analyses and converging evidence from various branches of literature would strongly suggest that this is what we should expect as well. I won’t expand on the details here, but the logic is roughly along the lines of aging-related cognitive decline, especially in working memory, as well as the loss of specific NMDA receptor subunit types.

By this point, I think I was also starting to read a bit more into the timescale literature, because… my prayers were heard by the LFP gods and our Cosyne abstract was accepted as a poster presentation against all spiking odds!

I’m going to Portugal!

Another technical side note: to give a little more context, most of the investigation on neuronal timescale, like that John Murray 2014 paper, happens in model organisms (rodents, monkeys) where people can record single neuron action potentials, and there’s a particular way timescale is computed from spikes. The last 5 years of research in that area has shown time and time again that there is a hierarchy of timescales in both rodents and monkeys, so it should be the case in humans as well (your reject-because-lacking-novelty radar should be going off). But for whatever reason—presumably spike chauvinism—mainstream people weren’t really doing this in time series data like LFP and ECoG, and you can’t really get that many neurons from the human cortex, so it remained to be confirmed.

The one exception I know is that of Chris Honey’s earlier human ECoG work in 2012, which already implicated a temporal processing hierarchy in the human auditory pathway by looking at broadband (or high gamma) timescales. So really, we didn’t invent anything new here, except to stumble upon a different (though slightly more precise) way of estimating timescales from the frequency domain, and to apply it to a much larger human ECoG dataset 10 years later. Oh and there was also the difference between spiking and ECoG timescales, so it’s likely that our method measures a fundamentally different process, so maybe there is something new?

7. it’s better to be lucky (Cosyne to SfN, March-November 2019)

Holy shit I’m in Lisbon, Portugal.

I’m really, really thankful for the collaboration that came out of this conference, not only because it was very productive and fun to work with these guys, but also because I’d otherwise feel very guilty about the fact that Brad keeps funding my exotic international work-vacations.

So I’m there by myself in Lisbon, and I wasn’t aware of anyone I knew at Cosyne because LFP and oscillations are the “exhaust fumes of cortical computation” (hahaha I will never let this go). I spent the first two nights working on a (rejected) grant and then the poster itself, as per usual. During my layover at London Heathrow, somebody said Lisbon is like a depressed version of Barcelona, because its all fun until suddenly everything shuts down at 12am. Not like I was going to go out by myself, but I did forget to bring a power adapter and ended up walking around the whole city the night I landed, trying to find a 7/11-like place to purchase one (did not succeed). Some hotel front desk person tried to sell me a lost-and-found one for 50 euros. Yeah okay there bro. I was eventually fortunate enough to make some friends of friends and had some dinner companions. None of this is relevant to the story, I just thought it was hilarious how woefully unprepared I was.

Anyway, the night of my poster presentation, I’m standing there with a box of pasteis de natas (pictured above) because nobody was going to come to a LFP poster, especially if it’s not one of the Gangulis, Sahanis, Harrises, or the likes, so I might as well enjoy myself. Incredibly enough, John Murray (of Murray et al. 2014) showed up, and I wasn’t quite sure what to do with myself for a bit, because I thought he’d think it all sucks or that it’s derivative (because I quite literally ripped the numbers from his paper). But we actually had a great conversation, and he’d suggested comparing the human cortical timescale maps to these “T1T2” (wtf was a T1T2?) and “gene expression” gradients from Burt et al., 2018, to see if there was a nice anatomical correlate to these dynamical timescales.

A bit later in the night, Thomas Pfeffer (my now co-author) came to check out the poster as well. I think we actually talked the night before at his poster, because I knew him from a brief Twitter exchange some years ago about this EI-balanced CROS model his friend was working on, so I recognized his name. He thought my poster was really nice, and suggested, uh, comparing the human cortical timescale maps to these “T1T2” and “gene expression” gradients from Burt et al., 2018 (wtf was a T1T2 gradient???). Even more graciously, he says that his lab mate had made these maps for some other thing, and that he could just email them to me, free of charge. All I’d have to do is to sell him my soul. Just kidding, it was all pro bono. All I’d have to do is align my timescale maps in the same spatial coordinate frame and do scipy.stats.spearmanr (…more or less).

This is what I mean by it’s better to be lucky. More specifically, it’s better to be lucky enough to meet friendly, knowledgeable, and helpful people. Tom and I ended up hanging out a bit more the rest of the conference, especially during the workshops in Cascais. We found another LFP enthusiast in Martin Vinck, who liked the Voytekian line of work, and shared fables of when oscillations were all the rage at Cosyne some 10 years ago. The last night of the workshops was capped off with him playing the piano in the conference hotel bar and then buying us these gigantic German-sized pints before the wrap-up party (who did not attend himself because it was his kid’s birthday the next day). Funny enough, Martin later became the handling editor for the paper at eLife. That’s just how it all worked out. I guess I’m writing this because this kind of stuff really gets me thinking about how amazingly weird life is. Sure, it was great that the collaboration ended up being productive and we had a friendly editor, but it’s not what I’m gonna be thinking about when I’m dead, you know? All that stuff pales in comparison to the memories of “…uh is that Zach Mainen DJing at this club party?”, and walking around the beach boardwalk in Cascais for lunch together talking about how hard academia is. Well, pretty lucky that this was at least part of my job.

I emailed Tom to ask about the data a few weeks after I got home (now April/May 2019), and that lab mate of his turned out to be Rudy van den Brink (our final co-author), who had already done a bunch of work to turn these “T1T2” and “gene expression” maps from voxel space to cortical surface space, and it was literally was just in a big feature-by-region matrix ready to go. Screwing around with comparing those maps to the cortical timescales takes us almost to the end of the year, where I presented my final SfN poster as a PhD student in Chicago (minus a nice 3-week break in the summer where I ended up back in Lisbon, of all places). I was eventually able to convince Tom and Rudy to become proper co-authors, thankfully, because it would’ve taken me 5 times as long to handle that data and write about it myself without the external guidance.

…alright so what are T1T2 and gene expression maps?

8. technical tangent 2: cortical gradients and hierarchies

Scientific context: at this point in the story, all we had was timescale values from the task-free MNI dataset (and the macaque results). It was a semi-replication and extension of existing ideas to humans, which was awesome, but ultimately, what’s the “new thing”? Tom’s idea was that we could look at how brain structure relates to these timescale values. This would give us a clue about how brain anatomy shapes brain dynamics.

What I had literally zero idea of, prior to having that conversation, was this recent explosion of works measuring anatomical hierarchy in the brain (in rodents, macaques, and humans), especially by ways of “macroscale cortical gradients”. There’s a pretty influential idea in neuroscience that the brain is hierarchically organized (Google “Van Essen diagram” for nightmare). It’s actually a pretty loosely used word (see Hilgetag et al., 2020), but here, I’ll just handwave and say that there’s a progression of anatomical features along the cortex, which enables different stages of cortical computation. Put more concretely, the sensory/motor areas of our brains smoothly transition into the multimodal association areas, like from primary motor cortex to the prefrontal cortex, which follows the progression of functional hierarchy, where motor cortex is responsible for immediate motor outputs, and prefrontal cortex is necessary for planning sequences of actions over longer term. This forms the “sensory-to-association axis”, and I hope it’s starting to smell familiar.

Unbeknownst to me, a bunch of recent works, taking advantage of technical advances in various imaging methods as well as really big open data collection efforts like the Human Connectome Project (HCP) and Allen Brain Atlas, have measured many anatomical features across the entire human neocortex. Many of these features follow this S-to-A axis, and these are what we colloquially call cortical gradients or maps, because they (more or less) vary smoothly across the entire human cortex. T1T2 (“tee-one-tee-two”) and gene expression, likewise, are two such measurements of brain anatomy/structure at two different spatial scales:

T1T2 (technically “T1w/T2w ratio”) is a non-invasive metric derived from MRI. It’s a proxy for how much grey matter myelination there is in an area, which is a proxy for the ratio of feedforward vs. feedback inputs to an area, which is a proxy of where along the processing chain an area sits, i.e., “anatomical hierarchy”. Don’t worry, it took me many, many days to get this straight, and I still need to pause to think about it when I see my own figure. The important bit is, the more “association-y” an area purportedly is, the less grey matter myelination there is, and hence the lower T1T2 value it registers from MRI. This is something you can get relatively painlessly: just stick someone in an MRI scanner, and out comes these values. I don’t know, I obviously don’t collect my own data, and thank god for the HCP so I don’t need to.

Cortical gene expression, on the other hand, is NOT something you can get painlessly. The Allen Brain Institute collected 6 post-mortem (deceased) human donor brains ~10 years ago, and was able to measure the expression of some 20 thousand genes at many different sites. Roughly speaking, “expression of gene X” in a brain area = “how much protein X” there is in that brain area. HUGE disclaimer: MORE OR LESS, probably more less than more more—it’s a big ongoing area of research, and there’s more caveats than results, that’s why we have jobs. But again, for brevity’s sake, more gene expression = more protein. What are these proteins? They are the shit that makes up everything in your body. In this particular case, we are interested in little machineries that can define cell types, facilitate transport of ions in and out of neurons, or form neurotransmitter receptors at synapses. Amazingly enough, the expression of these 20 thousand genes follow, to a first approximation, the same sensory-to-association axis as well: some types of synaptic protein become more abundant going along that gradient, and some become less so, some follow more closely, others not so much, etc.

With these maps—which albeit are average measurements taken from different people at different times, so no causal claims here—we can get a pretty crude but comprehensive look at what anatomical features are related to timescale, i.e., how structure may shape dynamics. The prediction is that timescale should inversely correlate with T1T2 (because timescale increases going up the hierarchy while T1T2 decreases), and that depending on which of the 20k genes we are looking at, there would be different expected correlations with timescale. For example, if a gene encodes for a synaptic receptor that prolongs synaptic currents, it should positively correlate with the timescale gradient, and vice versa.

Of course, both of those predictions were confirmed, otherwise I wouldn’t be writing this.

9. all we are is dust in the wind

That’s basically the crux of the anatomical analysis in the paper (Figures 2 & 3). One of the most consistent criticisms towards these analyses was that everything correlates with everything else in the brain, especially these large-scale gradients, for a variety of reasons, including the progression of cortical development, non-trivial spatial autocorrelation, etc. “So what if timescales correlate with cortical myelination, which correlate with gene expression, which correlate with timescale? Is this significant/surprising in any way?”

We included more sophisticated analyses to address this, like gene ontology enrichment analysis to find (in a blind way) clusters of genes that are strongly related to cortical timescale and are functionally related to each other, like synaptic proteins. In addition, I was able to include some analyses comparing the macroscale gradient relationships to single-cell level data. This latter one is probably my favorite analysis of the whole paper. It was certainly the most surprising result for me, and the one that convinced me that there might actually be something real here—something more than just screwing around with numbers in matrices—in a very fundamental biology kind of way.

More importantly, the seredipity behind how that came to be makes me chuckle everytime I think about it: apparently, Brad knows Shreejoy Tripathy from back when neuroscientists answering questions on Quora was a thing. Shreejoy was very kind to invite me to come by the lab at CAMH in Toronto whenever I was home to chat about the organoid stuff. I came home for Christmas 2019, took him up on his offer, and he ended up buying me lunch at this jerk joint I loved to go to when I was still at UofT (Tasty’s at Spadina and College, 5 dollar lunch special, you’re welcome). I met some folks in the lab, got some holiday cookies they were decorating, and inevitably discussed whether organoids had consciousness (jokingly, of course). I mentioned to him this new thing we were working on offhand, and he was like, “huh, you should check out these timescale-related genes we found in our new paper that just came out.” Like, I would have never in a million years thought to do this, because I was not aware of its existence. By “it”, I mean the entire subfield of neuroscience that is single-cell profiling (you: “bro is this guy even a neuroscientist???”). Nevertheless, a month later, I was ripping numbers from a supplementary table in Bomkamp et al., 2019. Can’t make this shit up: space dust coalescing on a space rock, talking to each other over jerk chicken.

Is there a takeaway here? I’m not sure. At the start of this project, I was a signals guy. All of this stuff is the kind of random shit that I’d never expect to learn, but end up having to by pure coincidence, and that’s kinda how it goes in the PhD—rabbit hole after rabbit hole. I thoroughly enjoyed it, and it obviously made the science much more interesting, because now we’re making a link between brain structure and brain dynamics. The actual “result” in those figures above took, I don’t know, 10 minutes to get? Because you just throw these arrays in a loop and correlate. The work leading up to that point, and certifying those results afterwards, are the things that took much longer (if Glasser parcellation, multiple comparison, and spatial autocorrelation means anything to you)—just so you don’t walk away from this thinking I did absolutely nothing. Also, I gotta say, it’s funny now as I write about how I didn’t know any of the literature beforehand, but it damn sure would have made my life easier if I had read those gradient papers earlier. But hey, if you don’t read, make sure you make friends with people who do.

In the next and final installment, I will shock the world by telling a story of a tenured professor running his own analysis, and his graduate student f-ing it up. Oh and, you know, the pandemic…

Neuronal Timescales - the Director’s Cut: Third Research Paper Published (Part 1/3)

2021-01-13T00:00:00+00:00

1. preamble

The third and final paper from my PhD thesis is published, just in time for when my degree is processed at the end of 2020. I am irrationally happy and proud about this fact, because of the weird feeling of completion it gives me. Like, I’m done. But what I feel more strongly is a marvel at how shit works out sometimes. This paper, as a scientific discovery, has its own narrative that best embeds itself within ongoing research efforts. This narrative is simultaneously true and fabricated: from the introductory references to earlier works, through every step of the results, to the future outlook—these are 100% scientifically accurate. At the same time, as I’d written earlier last year, this narrative is typically organized after the fact. The process of discovery often looks nothing like the structure of the paper, but it’s this process and this story that I find more fascinating, both in my own work and in that of others. That story is one that embeds in my own life over the last two years, through ups and downs, and that’s the story I want to tell here today. I want to say that it’s for an educational purpose, but mostly it gave myself some good chuckles.

Life’s comedy is inspired by tragedies, especially someone else’s. From that perspective, this blog post probably won’t be anywhere nearly as entertaining as the organoid one, because I’m not unleashing 5 year’s worth of pent up frustration and despair in one sitting. It was all quite pleasant, actually. Nevertheless, I want to share some “behind-the-scenes” commentary, if for nothing but as a reminder for myself that science (so far) is basically stumbling around in the dark until you find something, and then connecting all the most important dots in reverse (much like life itself). In fact, I think the central theme here is probably my appreciation for the incredibly fortuitous alignment of stardust in the universe to bring this together. My mom says my blog posts are getting way too long for one sitting, so I’ve cut the 8000-word essay into 3 parts. The next two parts will be released a few days later, you can find them here: Part 2 and Part 3.

If you want a quick overview of the scientific points in plain English, there’s a nice “digest by eLife” that summarizes the findings. Or there’s always the tweetprint. My hope for this series of posts, however, is that by witnessing how the process unfolds, it will provide people with a deeper understanding of the science, as well as, once again, how science actually works in a very human world. I call this blog post the Director’s Cut because if I had my way, this is the paper I would have submitted to journals. Maybe if/when I become a graybeard. But before I get into that, what does the vanilla edit look like?

2. “the commercial cut”: curated scientific narrative

In a nutshell, this paper presents a novel analysis method (i.e., a little math and code) for estimating “timescale” in neural time series—a fundamental quantity that characterizes the activity of a neuronal population over time (i.e., dynamics, more details below)—and applies it to measure cortex-wide timescale from invasive recordings (ECoG) in humans and macaque monkeys. More broadly, it combines several other datasets of various modalities, including bulk and single-cell gene expression, structural magnetic resonance imaging (MRI), and behavioral data to probe the physiological basis and the behavioral relevance of this “timescale” quantity across the brain. The narrative in the paper (and the eLife digest) implies that we identified an existing need to be able to estimate timescale from the human cortex, thus inventing a method to do so. Having done that, it enabled us to answer many outstanding questions that could not have been directly answered before, like which cellular and network properties shape cortical variations in timescale, how timescale is important for supporting functions like memory, and even more relevant for us human beings: how it breaks down when people get older.

As the schematic figure shows above, we address these questions with a kitchen sink worth of heterogeneous datasets. I had to learn a lot of random things and talk to people with expertise in those areas to be able to handle all this data. Regardless of the “discovery” itself, I’m happy about this point, and in some sense I feel like this is the “magnum opus” of my PhD because it adheres very closely to my belief that variation in brain structure gives rise to variation in brain dynamics, which is then hijacked for “computation” or brain functions in general. Therefore, I think it’s important to do work that can address as many points along this spectrum as possible—”full stack” neuroscience, if you will. If you take away nothing else but this point from here, I’d be very satisfied.

At the same time, however, I’d be lying if I said I knew these were the questions we wanted to answer, and how they were going to be answered, on day 0 of this project. Actually, that couldn’t be further from the truth.

3. birthing the project from extreme volatility (October 2018)

At the time this project began, I didn’t know what a “neuronal timescale” was, nor did I know that this became an extremely hot topic in systems neuroscience in the last 5ish years. Honestly, not a single clue, and I’m not sure if Brad knew either. Instead, at the time I was on a deep dive about 1/f exponents and how the various analysis methods for time series data—in particular, power spectrum exponent and detrended fluctuation analysis (DFA)—related to each other in measuring scale-freeness in brain activity, after my first paper. Scale-free, as in…a total lack of (time)scale.

How this project started had nothing to do with science, we didn’t set out to measure timescale in humans because we inferred that it was an open question. Like I said, zero clue what that was. No, it was much more pedestrian than that. The ingredients necessary for the birth of this project were: 1) a very tumultuous period of my personal life, 2) a change in my eating habit, 3) a lab very tolerant of my antics, 4) the knowledge of a small and otherwise useless piece of math, and 5) luck.

I’ve told the story about the log-log argument (below) in a few talks already, including at my defense, but I never told the story of why I was so invested in that debate that morning. It’s not because I cared SO much about policing how somebody wants to plot their power spectra, it’s because I was perpetually emotionally volatile and extremely hangry that particular morning. Well, every morning in those few months: volatile, because I had been in a very long-term relationship, one that had sadly ended in the summer of 2018 (ingredient 1). I obviously never talked or blogged about it because it’s a private part of my life and I didn’t think it was relevant for science, but in hindsight, what happens in your personal life is very much a part of the science you do, for better or for worse.

In my case (and probably in every case), what follows an extremely difficult breakup is a lot of drinking (and usage of other substances), not so much sleeping, and an increased baseline level of irritability and sadness, among other emotions. To offset those very unhealthy fluctuations, I decided I was going to work out a lot and eat well, so much so that I thought I’d try intermittent fasting, which meant eating 3 meals between noon and 7pm (ingredient 2). That worked out great because 1) I was probably in the best shape of my life despite the enormous caloric intake in the form of liquid bread, and 2) I was really focused (on the non-hungover days, obviously) between 9am and 12pm because I’d work at home in those morning hours.

I still do this when I can, and the hanger is totally great for working alone, because I didn’t think about much else. What it’s not great for is when you have to interact with other people, in real life or virtually, so I don’t usually check messages. But, on this random day in October, the melting pot of {crippling depression, irritability and aggression at everything that shines in life, and hanger} that is me, happened upon a poor soul who shared on Slack a power spectrum plotted in semi-log.

I guess I should take this opportunity to apologize once again to everybody: I’m sorry, I was hangry. I’m currently digitizing my old notebooks before the move to Germany, and happened upon this gem from October 2-5, 2018:

“Weird day. I snapped on Slack about the loglog plot. Not sure why I got so frustrated, I think because they were shooting it down without good reasons. In any case, then I couldn’t do anything all morning for some dumb reason. […] I’ve felt a subtle desire to just cry to somebody for a while now, like literally, and I’ve been surprised I haven’t been able to. […] intermittent eating has made me feel very energetic, even though I was a bit unhinged on Tues. I think I need to consciously manage this energy better.”

Thankfully, my lab was (and has always been) tolerant of my moods (ingredient 3). But just to make sure that I don’t forget about it, this debate has reached meme status, with a custom Slack emoji (pictured above).

4. technical tangent: frequency representation of exponential decay

Okay so what’s the loglog fuss about? To review, the frequency representation (i.e., “power spectrum”, or PSD) of brain signals—EEG, MEG, and intracranial recordings (ECoG/LFP)—almost always follows an 1/f power law: 1/f here meaning power (or energy) scales inversely to (f)requency, such that lower frequencies typically have larger amplitudes when represented as sine waves. It is my belief that people can judge the slope of a line (or the straightness, for that matter) much better than they can judge the curvature of an inverse power law curve, and so given that we’ve been looking at this 1/f phenomenon in neural data for a long time now, it’d just be easier for everyone to assess the “quality” of the power law when you see it as a straight (or not) line, which it just happens to be when you make both X- and Y-axis in log-scale. Hence, log-log.

Through discussions with lab folks prior to this little fiasco and from looking at a lot of ECoG and LFP PSDs (including my first paper on 1/f exponent & E/I balance), I knew that more often than not, neural PSDs are not purely 1/f power law. Instead, when you plot the PSD in loglog, there’s often a little bendy part around 10-30Hz, which we now call the “knee”. The small piece of math (ingredient 4) I happened to know was that the presence of a knee that connects a plateau portion (flat line) and an 1/f portion (line sloping down) indicates an exponentially decaying process—something much less mysterious than power laws—and that where the bendy knee occurs in frequency (in Hz) is a direct transformation of the exponential decay “time constant” (in seconds), or sometimes referred to as…“timescale” (math details in the methods section of the paper).

Exponential decay and decay time constants are ubiquitous concepts in the physical sciences, and most of you probably know it in the context of radioactive decay (e.g., carbon dating), so I will use that to explain it: different atomic elements are different in their “stability”, or how long they tend to keep a certain form before they spontaneously split into something else. Sometimes, the same element can split into a different form (or isotope) of the same element because neutrons shoot off, like Carbon-14 to Carbon-12, which is really useful for determining the age of a fossil sample. Sometimes, they split into two entirely different elements, like Plutonium-238 into Uranium-234 and whatever else. For the purpose of this metaphor, it doesn’t matter what Carbon-14 splits into, just the amount of time it typically takes to split (though the dissipative loss may have a nice analogy in the brain):

Carbon-14 has a “half life” of 5730 years, which means if some sample of really old shit (literally) has 100 C-14 atoms today, 5730 years later, only 50 will remain, the other half having spontaneously split into C-12 atoms. 5730 years from then, only 25 C-14 atoms will remain. So on and so forth. Carbon dating works by measuring how much C14 is in a sample today, hence inferring how much C-14 was lost and how much time has elapsed since that bison made that poo (or something to that effect).

If you chart how much C-14 remains over time, you will observe an “exponential decay curve”—the ratio of C-14 lost is the same over a constant period of time. You can characterize this curve with a single parameter, and that’s its decay time constant, which is a physical quantity of time (i.e., 5730 years). In contrast, compared to C-14, plutonium-238 has a much shorter half-life (87.7 years), so it’s more radioactive or volatile. If you chart the amount of Pu-238 over time, that curve falls much more quickly, like the yellow vs. the purple curves below. Note that, by convention, when we talk about half life, that number is reported as how long it takes for the “amount of stuff” to decrease by half. When we talk about exponential decay, the time constant reports how long it takes before we only have “1/e” amount of it, e being Euler’s number here, 2.718….so maybe we should call it e-Life… whoa.

In formal parlance, we’d say that the process of radioactive decay of C-14 has a longer timescale than Pu-238, and intuitively, C-14 is more stable and has a longer “history” that one can utilize (like for carbon dating). This idea of decay constant not only applies to radioactive elements, but any quantity that can be measured which exhibits spontaneous (and “memoryless”) decay over time, and it doesn’t even have to be physical “stuff”. And as I soon discovered, the idea translates perfectly to measuring “the amount of brain activity”.

5. what the f is a “neuronal timescale”? (still October 2018, 3 days later)

Back to the story: having gotten so invested in the log-log interpretation, I had to prove in some way that I was right, or at least that I didn’t make a stink for nothing. So I used our handy lab tool, spectral parameterization (thank god for Dr. Thomas Donoghue for making this thing so damn usable), to fit some modified 1/f curves (with the addition of the knee parameter, or a “Lorentzian”) to some macaque ECoG data from Neurotycho (the patron saint of my PhD).

What I wanted to prove was pretty trivial: that by including this knee parameter, the resulting 1/f exponent fit—the thing we actually cared about at the time—was much more accurate compared to slapping on a naive 1/f curve…because the whole thing is clearly not a straight line. The fact that the “knee” translated to a time constant (or “timescale”) estimate was inconsequential. But this is where ingredient 5 (luck) came in, for the first of many times over this project. I remember this really clearly because I was flying up to Berkeley to hang with some derps for the weekend (this was October 4), and because one of the night resulted in this gem of a picture triple clutching in one hand (I swear, this was a special period of my life):

The lucky part was that I happened to be reading this paper on the plane: “A Hierarchy of Intrinsic Timescales Across the Primate Cortex”. I’m not sure why I was reading it, probably because it was tangentially relevant to “scale-free”. But in this paper, the authors demonstrate that the activity of single neurons across the monkey cortex also exhibit this property of exponential decay. More interestingly, different brain regions have different decay timescales: sensory regions responsible for processing fast-changing perceptual information have short timescales like the more volatile Pu-238, while association regions thought to support long-term information integration (like in working memory and for making decisions) have longer timescales like the more stable C-14. This makes a lot of sense, and this work is one of the earlier ones that set off an entire line of research on timescales in the brain. But I’m highlighting this one in particular because, thank holy heaven, John Murray put the estimated values of the spiking timescales in Figure 1.

So then here comes lucky me: what if we compare these single neuron timescale values from this paper, with the timescale values I got from the Neurotycho ECoG data in the corresponding brain regions?? After some amateur-hour macaque cortex alignment (and I do mean amateur-hour), I copied over the single unit numbers from the paper (x-axis below), then grabbed estimates from the ECoG electrodes that matched where the single units were recorded (y-axis below), plopped them into Python and did some averaging, and got this:

“holy shit is this real? …I must have plotted one set of numbers against itself?” That’s pretty much the conversation in my head. This is about as good of an “aha-moment” as I can hope for in the computational sciences. Came home after the binger weekend, showed it to Brad (similar disbelief), checked over everything a couple more times, ran a few other sessions from the Neurotycho dataset, and it didn’t break! That was the first result of this paper, probably late October of 2018, because it was right around the Cosyne submission deadline. I was working on some other weird dynamical systems thing that I was going to submit but none of it ever fucking worked. So I wrote this result up like 3 days before, Googled and slapped together a few references on macaque cortical timescales, sent it in, prayed to the LFP gods, and went on to do pretty much nothing the rest of 2018.

And those were a couple of months in my life that I still think very fondly about. Stayed tuned for Part 2, in which I get even luckier, but in Lisbon, Portugal.

Post-script for the aficionados: you will notice in the above plot that while the single neuron and ECoG timescales follow each other very closely, they are actually off by a factor of 10. That is, single neuron timescale are on the order of hundreds of milliseconds, while ECoG timescales are in the 10-50ms range. This is pretty consistent across animals (and species). This relationship remains a mystery, and I’ve been asked about this many times. It’s partially due to parameter choices in the analyses (i.e., 1Hz resolution in the PSDs), but I suspect it’s primarily because of the fact that ECoG measures synaptic and transmembrane currents, which is a related but ultimately different (and faster) process than spike train autocorrelations, though no concrete proof is provided in this paper.

NMA2020: (Late) Thoughts on NMA, Higher Education, and Computational Neuroscience [11/52]

2020-09-16T00:00:00+00:00

Warning: a few sections have small to medium rants about higher education (in North America). The point is not my displeasure: the experiences that fueled them, especially if you’ve had similar ones as a graduate student or professor, serve as a stark contrast to further highlight the absolutely incredible accomplishment that is Neuromatch Academy. That being said, how NMA happened needs no backdrop of conventional institutional mediocrity to shine, but it certainly showed me what was possible for higher educations and summer schools moving forward.

TL;DR: it was nothing short of inspiring to witness and to be a tiny part of this summer school, and I learned so much as a TA that I feel bad getting paid. Highly recommend. In the sections below, I recount my experiences as:

a content reviewer: what impressed me about the organization,
a TA: what worked pedagogically as a summer school, and
a student: the coherent picture of computational neuroscience I took away from the NMA curriculum.

Shoutout to my wonderful pod: the Ethereal Ponies, and Sean’s serial killer note logo.

Higher Education Today

Having been an undergraduate student for 5 years, then a TA for 5 more, and eventually an instructor for my own course, all at research-focused institutions, I’m a bit disillusioned when it comes to teaching. It’s not that I haven’t experienced good teaching - I absolutely have. I’ve had the good fortune of being friends and colleagues with some people who care deeply about good and accessible education, and I’ve taken and TAed for classes where professors clearly put students first. However, time and time again, that has been the exception within institutions, not the norm.

Having been a PhD student, of course I understand the plight of the scientist-professor: if we suddenly had 20 more hours a week for free, sure, we might think more deeply about pedagogy for our 200-person class. But without that, how much time we actually dedicate to it is simply a function of what we choose to prioritize. The institution, whatever this monolithic word even means at this point, provides little incentive and meaningful feedback for instructors to teach well. Actually, no - the sole emphasis on research in many places actively discourages one to teach better than the minimally acceptable thing under the current norm. So, if we had 20 more hours a week, we’d spend it on research, because that’s what gets tenures and prints papers (figuratively and literally). I’ve also heard professors actively discouraging their grad students to not spend more time than necessary on TA duties because their main job is to do research, and the sad part is that that’s absolutely in the best interest of their student’s career in academia. It’s just the norm now: the better teacher you try to be, the more time you spend not doing research, and the more often you are undervalued. And this is not even going into adjunct professors and lecturers.

Ironically, the R1 institution is a business whose top product is a good education, and I’m not sure if our customers are getting their money’s worth. This cannot be more perfectly summarized than by this recent letter from some UCSD faculties that details their perspective through the ever-increasing enrollment while not being provided with sufficient resources to deliver quality education to their students. In my first year at UofT, we had a world-class research professor lecture for an intro bioengineering class, and I shit you not, they were late to class more often than I was, which is hard. This system doesn’t only produce uninspired teaching, it produces uninspired students. I really believe that the majority of students going into college want to learn something interesting, even if they were encouraged by their families to follow a more traditional career path. But it only takes so many 300-person lectures for you to realize that there’s no way that person standing up there is going to know your name, nor do they care to, so just give me my A and let’s all get on our way.

This is my (perhaps extremely cynical) perception of higher education as a scientist, and I’m a little tired of it. I’m tired for the people that are time and time again the minority actively choosing to fight the uphill battle to prioritize teaching because they believe it’s the right thing to do. I’m tired for the students that pay for a world class education and get some uninspiring curriculum. The worst part is, after all this time, I get it - under the current system of priorities, that’s the natural equilibrium, and I’ve kind of accepted that that’s just the way it is if you care about pedagogy as a research scientist, and it’s a weird realization to think that one is a part of a niche community for doing the job they are paid for. Maybe it wasn’t that way 20 years ago, but it is that way now. It’s not really anybody’s fault, and it was kind of okay for a while, until a pandemic came to stress test both the financial and educational aspects of our current system, revealing the inner workings of these institutions and what they must to in response to keep the businesses afloat.

I bring all this up to serve as a contrast: contextualized in the systemic inertia of underemphasizing and undervaluing good teaching, how much could a ragtag team of volunteers within that system—all of whom are under the same constraints of their regular research duties, teaching load, industry job, and other administrative duties, during a once-in-a-lifetime COVID pandemic—accomplish in 4 months, attempting to set up the biggest interactive online summer school to date?

Neuromatch Academy

Turns out, a fucking whole lot. Against the backdrop of my cynicism, this whole thing has been nothing short of inspiring to witness and be a part of. The sheer effort and person-hours put in by these people—organizing, producing the content, and mentoring, all for free—is just mind-boggling. But it is only dwarfed by the quality of the curriculum and the students’ overwhelming responses to being a part of the community. Blowing things up and starting over is rarely the best course of action, but I genuinely believe that this will be (at least partially) the gold standard for computational training for summer schools, and potentially for university curriculum altogether. All it took is for a bunch of people that care deeply about teaching and learning computational neuroscience to make this community where focus on pedagogy and student experience is not the exception, but the norm. It’s hard to not feel like I’m proselytizing for NMA at this point. Of course, many parts of it are inspired by education research and existing summer school layouts, while other aspects could be better in future iterations. But I really hope and believe that this program, and this model of teaching, is here to stay.

I don’t think I have a lot of unique insight or information to offer that hasn’t already been covered in one of the debrief sessions during the course, and I highly recommend checking out the NMA logistics and final wrap up videos to see what the behind-the-scenes process was like (some images in this post are taken directly from them). Nevertheless, the following is a report of my month-long journey with NMA, through the lens of a few different roles—waxer (content polisher), TA, and (accidental) student—and what I learned in each. I journaled most of these thoughts along the way, and the purpose of it is really to highlight aspects of this summer school that blew me away, things that worked well and things that could be improved for the future, as well as to provide a little more context for what computational neuroscience is (as presented in those 3 weeks) in the grande scheme of neuroscience and cognitive science.

As a Waxer: Content & Scheduling Logistics

The majority of NMA content is delivered to the students via tutorial Jupyter notebooks. A “waxer”, in NMA lingo, is a person that checks and polishes these notebooks before shipping, after high-level content decisions and coding exercises have been more or less finalized by the content creators of that day, which is itself a multi-step process (outlined below). I wasn’t familiar with any of this, neither were most TAs I think, because there’s no reason to be, it wasn’t part of my job. I somehow just assumed that these notebooks would pop out of thin air, perfectly organized and ready for students to use, which was dumb because I literally went through this process for my own class last summer. That was…non-trivial, and I only had to make 4 Jupyter notebooks for lab for 30 people, which ended up taking like 2 weeks of solid work to do, with no embedded videos, coordination between multiple teams, and more than a handful of bugs in the final product. But because of that experience, I was recruited to help with polishing these notebooks, so I was privy to the inner workings of NMA a couple of weeks before class officially started, and boy was that eye-opening. Stunning, probably, was the better word. As in, I was literally stunned when I joined the team Slack, scrolling through the different channels and witnessing the bustling activity 2 weeks before NMA started. This was a direct quote from my notes:

“Patrick brought me on as a waxer on June 25, I was kind of stunned at just how much of a monster effort it was, from rounding up speakers for their videos, making and waxing tutorial notebooks, cleansing content of CC materials, etc. When I say stunned, I mean literally I was just sitting there going through all the docs to try to wrap my head around the organizational structure. Insane workload and insanely inspiring that these people were putting in so much time and effort to make this thing run, and most of which more senior than me personally.”

It’s like one of those movie scenes, where you get invited to Charlie’s chocolate factory, and somebody opens the front gate and all of a sudden what looks like a shack from the outside is actually a gigantic compound with thousands of oompa loompas scrambling to put stuff together (not saying NMA folks are oompa loompas…), and you’re standing there jaws wide open until one of them crosses in front of you wheeling a comically large cart of stuff and says to you “please get out of the way sir.” Thankfully this was me browsing Slack and the various Google docs and sheets in my room, but I remember doing that for a solid half day. Coupled with the stream of TA-related emails we were starting to get from Carsen, it was like, “oh wow, this is what it’s like.”

My immediate first thought was: “how the hell is this happening? lol oh no there’s no way this can get done.” I think that was triggered by my witnessing of the emergency of that day (there were usually several a day), which was the realization that all NMA material had to be copyright-infringement free, so teams were rallying the content creators to remake their figures and/or get permission from journals so they can use figures as is in the lecture slides. I was literally remaking one panel of a figure for a paper submission that week and I spent a good hour fuming about it. Now we’re asking these faculties to redo their figures for a free summer school lecture, some of which look so old I’m sure they were originally printed in black and white? Major lol.

The second thought I had was: “hold on, I know of most of these people, either IRL, on Twitter, or I’ve read their papers. These people…all have regular jobs as graduate students, postdocs, or faculties. How are they spending THIS much time on this volunteer job for a summer school… and why???” I definitely had some small fanboy moments when I saw people I did know, and even the people I didn’t know, they seem to be high performers in whatever industry/area of research they were from, an accomplished or budding academic, across all the different teams, including the content day chiefs and the content creators themselves. Given my rant above on the institutional pressure to produce research and not waste time on teaching (much less volunteer efforts and outreach), you can see why I was a bit taken aback. Adding to that shock, TA duties were ramping up later that week, and we get this course daily schedule from Carson. I can’t imagine how long it took for the organizers to design this, and you can’t imagine how long I looked at this thing trying to figure it out. So it was all just…a lot to take in.

Back to waxing. Why did waxers exist in the first place? It’s just another example of the embodiment of the values the organization had from the top down, and I paraphrase something Michael said in one of the linked videos above: “the organizers/content creators suffer so the students don’t have to.” Here’s a quick summary of my (limited) understanding of the content creation process:

The content day chief and content creators (CC team) decides on the topics to be covered for the day and their progression. Then slides and videos for the intro and outro lecture, as well as micro tutorial videos embedded in the Jupyter notebooks are filmed and handed over to the dedicated NMA editing & reviewing team. Videos are cut down to appropriate lengths, combined with slides (where the lecturer video is plopped into a little square in a corner of the slides), then uploaded and curated on YouTube.
At the same time, CCs create Jupyter notebooks with code exercises and explanations in a master notebook, and the reviewing team provides feedback for tutorial organization, content difficulty, and various other high level things concerning both technical and presentation aspects of the content.
Finally, the tutorials are handed off to the waxers to polish, which means embedding the microvideos where they are suppose to appear, making sure the notebooks adhere to the universal style guide, and that the code runs smoothly and produces the expected outputs, etc. When all that is finalized, it’s pushed from Google Colab to GitHub, where Michael Waskom’s suite of continuous integration tools generated student and instructor versions (solutions are hidden in the student versions), automatically checked for code execution, as well as PEP8 and various other custom stylistic concerns.
That completes the first round of content creation, and the first version of completed NMA material is born. All of the course material then goes through “pre-pod”, where TAs pretending to be students go through the entire 3 weeks of material and provide detailed feedback regarding difficulty, clarity, and whether there is too much/too little material - basically, anything students might have issues with. That feedback is distributed to the content creators, and they modify the tutorials and reshoot some videos accordingly, pass those to the content reviewer team again for touch ups, which all have to be completed within a week. Finally, it makes it to the waxers, ready to be loaded into shipping containers for final push to GitHub, where this kind of stuff happens:

Just reading that summary feels like a lot, imagine the effort required to actually do it, both from content creation and curation’s perspective. This doesn’t even mention the “China team”, which has to figure out workarounds because Google stuff (e.g., YouTube) cannot be used in China, and I’m sure I’ve missed (or was never exposed to) a bunch of other teams that were crucial for the preparation of the material (sorry if that’s the case!). And this is only content, which is the meat of NMA for sure, but still just a component. Recruiting, logistics, sponsorship, student feedback, outreach & communications—all of these teams were needed to make this stuff come together. This also does not mention unexpected and non-school related hiccups along the way, like the “Iran Saga”, where in the span of a week, Iranian students and TAs had to be dropped because of the ongoing American sanction on Iran, and then reintegrated because NMA was granted an exception. That was definitely an emotional rollercoaster, to say the least.

All this brings me back to the same point: the team was committed, top through bottom, to making this as seamless of an experience for the students as possible. You can easily imagine a version of this virtual summer school where each day centers on a Jupyter notebook (if not a plain old problem set PDF), and embedded in it are links out to YouTube videos filmed and uploaded by the content creators themselves, with no length or stylistic consistency in any of the modalities and bugs all over, which the TAs will have to troubleshoot in real-time. Instead, we have NMA, where you get content delivered to you by the absolute experts in their respective areas of computational neuroscience and machine learning, but in a way that feels completely consistent and coherent (I won’t say seamless, but it was close).

I couldn’t have imagined an experience like this before NMA, but it dawned on me that this is how university curriculums for a single major should feel like: different classes taught by different professors, but with overarching connections and consistencies between classes. Obviously, making several college courses consistent within all possible trajectories and majors they feed into is a much more challenging issue, but we have millions of dollars to do it, and it’s certainly not impossible. Much of this was possible in NMA because a lot of industry-standard practices were borrowed, which is sort of ironic because universities actively borrow corporate practices on how the business should be run to generate revenue, but not how the basic product should be created and delivered to satisfy customers’ needs.

Anyway, there’s much more to say here and many details were glossed over, but I hope it conveys my impression that it was a daunting task that were pulled together by a bunch of passionate people that just wanted to get shit done (shoutout to my much more productive waxer-twin, Spiros Chavlis). This was from my notes on July 12, the day before NMA Day 1:

I’ve only been a very small cog in this machine, mostly on the TA side and helping lightly with content reviews. I’d always thought that being a small cog in a machine, like working for a large corporation, would feel really meaningless. But this has been the complete opposite experience, not sure why. The values and mission from top-down has felt personal and meaningful, and working within the local teams has been extremely smooth, probably because everyone involved has been super dedicated and competent. The waxing team from coordinators to waxers have been really good. Just impressed all around. The final products are something that everyone is proud of, and rightfully so, because the tutorial notebooks are pedagogical, and consistently beautiful across. Of course the content creators deserve a ton of credit for this, especially for following the style guides so closely while the guidelines have been fluctuating. I can only imagine the kind of work the video editing team is putting in.

As a TA: Pedagogy & Day-to-Day Execution

TAs are an integral component of NMA, and this follows from the philosophy that the school should have as much interactive experiences as possible. I’ve taken my share of online courses on Coursera and MIT OCW, and those are good if I was really interested in a topic and can go through the lectures in small bites. But NMA was not that, and if you expect students to work through 5 hours worth of coding and tutorials by themselves everyday, you’d probably…not have that many students left after 3 days, which would be a shame because so much effort was put into producing the content in the first place (it’d be interesting to see how many Observer track students actually finished all of the content, compared to interactive track students).

Like many university classes, TAs are the hands on the ground that carry out the whole operation and are the first point of contact for the students. Not only do the TAs have to be familiar enough with the content to guide their pod students and ask probing questions, they have to manage the interactions constantly, making sure everyone is contributing and engaged, as well as helping to sort out any logistics stuff. It was the explicit expectation that TAs carry the most significant portion of the teaching load, especially after the content has been finalized, and were thus treated like important members of the whole operation. To prepare us, NMA had 2 days of Lead TA training, then 3 days of TA training (which is not a lot), where Kate, Gunnar, Carsen and others went through the philosophy and logistics of the course, as well as what TA’s jobs are and how they should be performed. More importantly, we got training on pedagogical tools like formative assessments and active learning. Those first few 5-hour zoom sessions were a shock to my system for sure, but all of it was to ensure that everyone was on the same page in terms of philosophy, code of conduct, logistics, and having the appropriate “soft skill” tools to move forward, and it was completely necessary since many TAs were understandably anxious about the coming 3 weeks.

Side note (rant incoming), if you read the last paragraph and thought to yourself, “sure, that makes sense”, then you probably have not TAed for an university. As it turns out, graduate student TAs in most universities serve the exact same roles, but are more often than not treated as an afterthought. I don’t think it’s an uncommon sentiment that TAs are just the university’s henchpeople for doing all the unglamorous things, and are a begrudging necessity to keep the whole operation afloat. I suppose we should all be grateful to get a TA position that partly pays our fees and salary through grad school, depending on what your departmental support is like. But I’m not advocating for TAs to be treated like upper-class citizens around here, even though the infrastructure would crumble in less than a day if TAs went on strike (to be fair, professors are typically appreciative of the TAs work, though not always). I just think the institution should acknowledge all aspects of the job a TA has to do, and do the bare minimum to prepare people for them. Seeing how these are future professors-to-be, don’t we want to, I don’t know, make sure they 1) can teach, and 2) are happy teaching? Maybe me in first year wouldn’t have given a shit anyway, but the extent of my formal TA training was a half-day seminar in my first week of grad school, as part of the university-wide graduate student bootcamp. Not sure what I took away from it other than a boxed lunch and “don’t sleep with your students”. I think the logic is that if you’ve been a student in university, you should know how to TA. Of course, a lot of the training varies course-to-course, but most of it doesn’t. But surprising to nobody, one bullet point on a slide saying “encourage equitable classroom discussions :)” doesn’t actually provide one with the skills and tools to do that. So if you want to know how to TA better for your home institution, maybe sign up to be an NMA TA next year.

TAs also happen to be the highest-paid people during NMA, because we were the only paid people (with some minor exceptions). I genuinely don’t know what surprised me more: that every other member-including organizers, content creators/reviewers, operations and logistics team-did all that work for free, or that TAs were not expected to do the same. You can certainly make a case that TAs will get a lot out of this whole thing, like networking opportunities, teaching experience, and learning the material for free. I actually didn’t know we’d be paid when I signed up. I’m not sure what my thought process was, but I would’ve told you that there is no amount of money in the world that can pay me to Zoom-teach 5 hours a day for 15 days anyway. Now that the whole thing has ended, I am happy to report that I would have indeed done it for free, so maybe I will donate some of it—suggestions are welcome (but it’s very, very nice to get monetary compensation for that work).

So just like that, the first day of NMA was upon us. After a silly ice-breaker activity (pictured above), I cautioned my pod that it would be unwise to have expectations similar to any previous summer schools they’ve attended because, you know, the whole thing being over Zoom and all. But to be honest, by the end of the 3 weeks, it felt pretty similar to a regular summer course, if not better. Though the group drinking and sporting events, and watercooler science and networking were sorely missed. So maybe not similar, but certainly engaging, and even fun and “normal” at times. What is my metric for engagement? Number of times somebody looked like they were about to fall asleep. Am I especially bad (or good?) when it comes to falling asleep in class, maybe. But it’s rare to have a 15 day streak where nobody falls asleep, because that’s an academic accomplishment even for some faculties attending seminars (though there were definitely days where my entire group was low-energy, confused, and discouraged by the afternoon, me included). No shade on all the in-person lecture-based summer schools, but this format of predominantly interactive and small-group hands-on tutorials is almost surely more engaging than a prominent guest speaker giving a standard research talk for hours. Good research talks are not the antitheses of good lectures, they’re just not really related.

Practically, the days went something like this: as a TA, I’d prepare the night before, or sometimes (read: often) the morning of. I will watch the 30-min intro lecture, go through the tutorials to make sure that I can solve all the problems, and have some nuggets of insight in my back pocket should a relevant discussion come up. Then, everyone trickles in around 11am, say good mornings and hellos, and get on with the day. Somedays there were questions and discussions about the intro video, somedays there weren’t, but it was really nice to “show up to work” everyday for those 3 weeks and see the same people. My pod had 9 students, so I’d usually give a meandering overview of the topic for the day to the whole group, set a target for how much we should have gotten done by lunch (rarely met), and then split them up randomly into two Zoom breakout rooms. They pretty much took it from there. The NMA tutorials are all directly accessible via Google Colab, so there was no screwing around with Anaconda and local Jupyter environments, everyone just had to make their own copies in Google Drive. The groups would then go off to watch the micro-videos and read the embedded notes individually, and reconvene for the coding exercises. This was all pretty straightforward, and not much for me to report on, because I really didn’t have to do much except hop back and forth between rooms to answer questions and track progress.

The one thing that’s really worth highlighting is the advice somebody gave during the TA training (maybe Kate?), which is that students are encouraged to rotate amongst themselves on who’s “driving”. I’m not sure if that’s official NMA lingo or if I invented it for my pod, but the “driver” is the person sharing their screen on Zoom for the current coding exercise and actually typing into their notebook and running the code, while everyone else dictates for them what to type. This is a bit anxiety-inducing for everybody at first, especially for those that are not so confident about their coding abilities. But funny enough, the driver typically needs to do the least thinking (if they wished), because there are 3 other people telling them what to type, while it still keeps the driver “with it” enough to at least be syntactically correct. Once the code successfully runs, my pod was pretty good about checking each other’s understanding and making sure that everyone was on the same page, and this is where encouraging people to ask questions really become useful, even though it might slow down progress sometimes. When everyone is satisfied, they move on to the next exercise, and somebody else drives. In this way, it keeps the pack more or less at the same speed, and is just better for group cohesion in general because everybody is constantly working together. It warmed my heart when I popped into a room and somebody asked “okay who’s driving next”, and there were times that I felt bad about this, because moving at the average speed means that it’s either too fast or too slow for somebody. But as a whole, the slower folks asked incisive questions that improved understanding for everybody, be it a python syntax question or a conceptual question, and the faster folks were really good about being as didactic as possible, asking others for input and not just banging out the answer on their own. Shoutout to Sami and Yue, who were really as much of TAs as me, not just in the way they were familiar with the material, but in how they engaged the whole group to participate even if they’d known the answer.

So all that for about 2 hours, then we break for an hour for lunch, which somehow was never enough for me to actually get done eating. This really was a full-time job for 3 weeks. Same thing after lunch, where we try to finish the rest of the tutorials for the day. Whether that actually happened varied day to day. Somedays we got very close, somedays it was an absolute slog, but we’d all be totally beat by 4pm. We always reconvened for the final 10-15 minutes though to review the high-level ideas of the day, how they relate to neuroscience, and then we all added to this ever-growing monstrosity of a concept map. You can’t actually see any of the words, but I thought this mind-mapping tool (Miro) was really neat (again, at the recommendation of someone during TA training). Part of the variance was because my own expertise also varied day to day, I was pretty honest about it, and it became a fun learning process for me as well, especially to see connections I hadn’t realized before. It was like simultaneously knowing a lot of random shit I never use in my actual work, but also never being completely comfortable with a particular topic.

Lastly, it was very helpful to have a TA Slack channel where people discussed their wins and troubles, and seeing what other TAs did for their pods—above and beyond what was required—was also inspiring. I remember seeing Max’s tweet about using the Zoom poll to check for understanding the previous day’s material, and I really, really liked that. So much so that I stole it for a few days, but it didn’t last very long because it was hard to adopt a new habit halfway through. And there are so many more aspects of this summer school that’s worth mentioning, like the group projects, mentorship matching, online social activities, and yoga breaks - there just isn’t enough space for it all. Each component had its own technical and logistic issues, but that just comes with experimenting with new things (like the Mozilla Hub hahaha). I will say more about the curriculum in the next section, but I think one common complaint was that this felt more like a data science or machine learning course, than a computational neuroscience course per se. Part of it was a lack of concrete neurobiology, and part of it was (at times) a lack of clear and explicit applications to neuroscience problems, such that the psych/cog/bio people may get lost in the math, while the ML/computational people were a bit underwhelmed by the whole brain side of things. In any case, it wasn’t going to be perfect on the first try, but from my perspective, it was an overwhelmingly positive experience, and I had just an absolutely wonderful time with my pod. One day, beers on me when we see each other in real life.

As a Student: What is Computational Neuroscience?

Throughout my PhD, I’ve had to grapple with what computational neuroscience means, because even though I spend 99% of my time working on the computer and having just participated in my 3rd computational neuroscience summer school in 6 years, I’ve never felt that I was a “computational neuroscientist”. The stuff I do just doesn’t fit the mold, and this is certainly true if you see the works presented at Cosyne and follow that (implicit) definition, but it’s hard to verbalize what exactly that is. Maybe it’s a function of how the NMA curriculum is clearly organized, or maybe it’s the fact that I had to actively prepare in order to TA, or maybe it’s just that after you’ve been around long enough, everything eventually sinks in. Regardless, I finally felt like I got a very coherent, high-level view of the philosophy behind (mainstream) computational neuroscience, and as a result, its assumptions and limitations. Of course, NMA in no way represents all of computational neuroscience, but its purpose is to provide an overview of the field, and from my experience, it’s pretty consistent with the major directions in the last decade. The below is not a criticism of the NMA curriculum or the field, but contextualizing what we’ve learned in those 3 weeks, and what alternatives there are to looking at behavior and the brain from a computational perspective. In brief, if there’s anything I’d add to the curriculum, it would be 1) a short but explicit primer on the history and philosophy of computational neuroscience, especially views inherited from the “cognitive revolution”, and 2) “model-free” time-series analysis techniques that many researchers rely heavily on (for those that use fMRI, EEG, LFP, etc.), but does not fit within the mold of traditional computational neuroscience (we’ll see why in a second).

On the Wikipedia page for “computational neuroscience”, it says that the term was introduced in 1985, and that its components had existed since the early 1900s with the OG integrate-and-fire model. Today, detailed biophysical modeling is almost its own niche thing, and the “lightly-biophysical” spiking neural network modeling works have largely adopted and become motivated by the same perspective: the brain is an information processing system. That statement might seem so obvious and uncontroversial that it’s hard to imagine another possibility, and while I certainly do not wish to revive the “brain is/isn’t a computer” debate on my Twitter timeline, it is interesting to note how the dominant view came to be. Probably not many people know this, I certainly didn’t until I got to work on this very esoteric project with some friends, but computational neuroscience and cognitive science are actually intimately intertwined in history, and one could argue that computational neuroscience is even a sub-field of cognitive science, or at least the offspring of neurobiology and cognitivism. You can read a personal account from George Miller himself, but he mostly focuses on cognitive science. Briefly, throughout the 60s, cognitive science was bubbling up as the interdisciplinary collaboration between a few fields to tackle the question of intelligence in the mind—computer science and neuroscience being two of them. This was spurred by technological advances as much as anything else, because computers became small enough and fast enough to do interesting things (this will be a recurring theme). In 1976, the Sloan Foundation created a Special Program in Cognitive Science, which was really tied together by this philosophy that the “mind” is an information processing device, birthed from the framework of “cognitivism” and later “computationalism” in psychology (as opposed to behavioralism). While cognitive science has somewhat evolved in the years since then (though still dominated by cognitive psychology), computational neuroscience sprinted off in this direction and, arguably, became a much more ~~ludicrous~~ (EDIT: I meant lucrative) research program (in terms of funding and popular appeal today, probably because, well, the brain). And because of this history, computational neuroscience and machine learning are inseparable, and it’s clear from the NMA curriculum as it reviews a series of methodologies developed in the last 30 years or so, showing that while time flows, high-level ideas are largely unchanged.

Let’s start from a very specific and straightforward example in machine learning: Generalized Linear Models (GLM). I really love this slide from Cristina Savin’s intro lecture (W1D4), because it showed how the components of GLM are “plug-and-play”, and how those components are largely irrelevant to the key idea behind all GLMs (and all of machine learning): optimization. If you plug in the identity function as your nonlinearity and the Gaussian distribution as your stochasticity (likelihood), you get linear regression. Swap those out for the logistic function and Bernoulli distributions, you get logistic regression. Critically, they can be solved by the same maximum likelihood optimization procedure, which tunes the (linear) parameters of the model to best produce the expected output given any input.

So far, all standard machine learning stuff, no neuroscience. At a higher level, you can even swap out the architecture of the model, as well as the optimization target or “cost function” (e.g., maximum likelihood vs. minimize least squares error vs. with regularization). A special case occurs when you take the exponential function as the nonlinearity, and Poisson distributions as the likelihood, which gets you the Poisson GLM. At face value, this is just another machine learning model, and can be applied to any data when the assumptions are appropriately satisfied (e.g., non-negative integer outputs). But under the hood, Poisson GLM models how single neurons encode and process information, and it has its own name in neuroscience: linear-nonlinear Poisson (LNP) cascade model. (But see the end of the post for some corrections, more technical details, and historical tidbits on the difference between Poisson GLM and LNP, very kindly provided by Jonathan Pillow, quoted from this exchange.)

For example, V1 neurons are thought to act as linear convolutional filters of the visual field (e.g., Gabor patches), the output of these linear transformations are then mapped to firing rates via the exponential function, and finally, it stochastically emits a discrete number of action potentials based on the firing rate that follows a Poisson distribution. To be technically accurate, I believe Poisson GLM is one way of solving for the parameters of a LNP model (another being spike-triggered average). The sleight-of-hand is so subtle that it’s very easy to lose the distinction: on the one hand, we have a purely machine learning model that tries to map input to output, on the other hand, we have a generative model of how single neurons behave when an animal sees a picture, and all of a sudden, they are one and the same! More importantly, the LNP neural model inherits assumptions of the Poisson GLM machine learning model, maybe the most philosophically challenging one being that when we solve for “what the neuron cares about” i.e., its parameters or filter weights, we assume it tries to do “the task” optimally. This then becomes a (rather light-weight) normative model, i.e., a computational model that solves a task optimally under the same environmental constraints the neuron (or person) is under, which we then posit to be solving the task the same way a biological neuron (or person) does, including even the internal representations.

I fucking LOVE This meme.

Poisson GLM is not a cherry-picked example, and that’s the beautifully coherent picture I got to take away from 3 weeks of NMA. Almost every single day of NMA can be cast in this form, where the machine learning model is some optimization procedure to map inputs to outputs, and the neuroscience model basically says “well that’s probably how the brain does it too.” Bayesian inference? The brain does posteriors too. PCA and dimensionality reduction? Minimize data reconstruct error, see neural subspaces and manifolds. Markov chains, dynamical systems, and control theory? Minimize future state prediction error, recurrent neural networks in the motor and cerebellar system. Reinforcement learning? Minimize action-outcome value mapping, via temporal-difference error signal of dopamine neurons. Deep learning? Well, you get the idea, hopefully. The correspondence is not only convenient, it is intuitive: optimization of the model parameters is the “learning” part of machine learning, and maybe that’s similar to how the brain learns to perform tasks? And this is why biologically plausible learning rules for deep learning is such a big deal: it would make our lives a lot easier.

In this way, I understood where the “computational” in computational neuroscience comes from: computational neuroscience is more than just using modern computational techniques, like machine learning, to help decode or otherwise analyze neuronal responses. It’s (at least in large part) using machine learning as a model of the brain (biological learning), and finding analogies between the model representation and neural representation, while attempting to understanding the computations (or, transformations) that takes place from input to output. Critically, components of the model can be (and often are) left alone as statistical or computational black boxes: it doesn’t really matter how the neuron does the linear filtering over the receptive field, and obviously V1 neurons don’t even have access to the visual field directly, but all that matters is that it “filters”. Of course, there is work being done that attempts to map the entire visual pathway to cast the operations as synaptic transformations one step at a time, but that’s more of an implementation detail than algorithmic insight. The vigilant reader will notice the use of lingo from Marr’s levels of analyses, and it’s no coincidence that David Marr was instrumental in both cognitive science and computational neuroscience, and whose distinction between computation, algorithm, and implementation translates to NMA’s very first lectures centered on “how to model”.

The point is, the way the NMA curriculum was organized really made me appreciate that: one, machine learning is just different optimization problems, and one can choose different probability distributions, cost functions (error in space, time, reconstruction, distributions, etc.), and make different assumptions about the data to apply different techniques. This is not to trivialize machine learning, but to hopefully ease some anxiety when an ML-outsider (like me) sees the daunting collection of seemingly disparate techniques, because there is an unifying strand. More importantly, two, “computational neuroscience” as a field truly embodies the idea that the brain really performs these computations. Why is that even a point to bring up in a blog post about computational neuroscience? Because that’s only one of many ways for tackling neuroscience using a computation-heavy approach. Spiking neural networks, for example, are not inherently about computations at all. One can mimic the dynamics of single neurons and populations using networks of LIF or more sophisticated neurons and understand a whole lot about the physiological mechanisms behind, e.g., spatiotemporal correlations in the brain. Same is true of “mass models” like a network of Wilson-Cowan neurons. But mainstream computational neuroscience further asks: what computational properties, in terms of input-output mapping, do these dynamics endow, as there are now efforts to interpret dynamical properties at both of those levels as integral for the computation in biological and artificial recurrent neural networks (e.g., computation-through-dynamics, reservoir computing).

Also, there are potential blindspots when these methodologies are not viewed within a historical context. First, as I hope is evident from that list of examples above, computational neuroscience tends to lock into the hot computational-method-du-jour. This is not inherently a bad thing at all, and to be fair, it has been a (fruitful and interesting) two-way exchange between machine learning and neuroscience. Parallel distributed processing, the grandma of deep learning, was one of the flagship products from that era of cognitivism (also closely tied to Cognitive Science, at UCSD, no less) that took ideas from neuroscience into computer science. The danger is just that trends happen, and typically, the same trend happens again in 30 or so years. GLM might be hot again in 10-20 years, who knows. There’s been enough written on both sides of this debate (which, funny enough, came up multiple times in my pod), and I’m not committed to either position very strongly, but there is some naivety in thinking that XYZ == the brain without acknowledging that we thought that for pretty much every new XYZ. I suppose you could argue that they are all in fact the same—coupled linear and nonlinear transformations—but then again, everything is dot products wrapped inside nonlinearity so that’s cheating a bit.

The other blindspot is that strongly subscribing to the computational account, and therefore focusing on methods that take machine learning models as generative models of the brain, may be incompatible with “model-free” approaches, especially in the context of a summer school. Perhaps this is just outside of the scope of NMA and speaks to a much larger divide between computational neuroscience and neuroscience-employing-computational-methods (especially cognitive neuroscience), but I think a non-trivial portion of the students came in wanting to learn more about human data-compatible techniques, some of which involve time series analysis, but certainly computational approaches that do not explicitly model the computational steps of the brain. Hell, even getting the spikes from broadband LFP recordings needs time series analysis. Again, this is not a criticism and I certainly understand the choice of topics given the timeline. Also, typing out the sentence before that, it made me realize how weird it is that computational neuroscience in part came from cognitive science, but now is as far away as one can get from the field of “cognitive neuroscience”—computational cognitive neuroscience notwithstanding.

Whoa, that got kind of long. Well, these are my thoughts from this whole experience, which has been incredibly rewarding and a definite highlight in this dumpster fire year of 2020. I could not recommend it more to people, either as a TA, student, volunteer, or mentor (or all of the above), and at the rate that this team iterates to improve, I’m sure NMA2021 will be even better.

EDIT: details on pGLM and LNP. JP: “One quick note on GLM vs the LNP model. (I agree the differences are subtle, but there are a few key diffs!) 1) GLM assumes a fixed nonlinearity (eg exponential, sigmoid), whereas fitting the LNP model generally involves fitting the nonlinearity. (Technically, if you fit the nonlinearity along with the filter, you wouldn’t call it a GLM!). 2) The nonlinearity in GLM must be monotonic, whereas LNP nonlinearity can have arbitrary shape (eg quadratic). 3) LNP model can have multiple filters, whose outputs are combined nonlinearly (i.e. via a multi-dimensional nonlinearity), whereas GLM uses a single linear projection of the input. NB: the “maximally informative dimensions” (MID) estimator proposed by Sharpee et al 2004 is simply a maximum-likelihood estimator for the LNP model (albeit framed in terms of information instead of log-likelihood) ref (…) Although one additional complication is that with Poisson GLMs, it’s common to use spike history as input… which makes the resulting spike trains non-Poisson. Generally people don’t (for whatever reason) do this with LNP models. As a result, pGLM models can capture non-Poisson spiking statistics, whereas (standard) LNP models can’t. When I was working on these models during my PhD, Eero Simoncelli wanted to call pGLM with spike-history the “recurrent LNP” model, which I think is nice, but Liam won out and (following Wilson Truccolo & Emery Brown) we stuck with GLM.”

RIP.

Fear. [10/52]

2020-07-01T00:00:00+00:00

Like many, I fear that I will have said something insensitive or incorrect here, or that I’m taking up too much space. I apologize if that is the case. But more than that, I fear how I will feel down the line if these words never left my mind. TL;DR: go listen to Dave Chappelle.

Fear I. Death

Death is unrelatable. We die once, and nobody tells us what it’s like to die. We can grieve, but most of us can’t empathize with the dead, or even the dying.

Fear - fear of our own death, and the deaths of our loved ones - on the other hand, is much more relatable. Most people, I’d think, are scared of certain aspects of death. If you’ve made it to now, you’ve probably feared for your own life at some point, and probably irrationally so (unless you’ve had a serious medical condition or were in an accident). Maybe you can’t relate with the black men who died at the hands of police officers, and maybe you think the world is not so simple, or maybe you simply cannot imagine the police to be anything other than defenders of justice. So let’s put all the nuances and the politics aside for a moment and look at the first principles - I promise, we’ll revisit them.

Don’t try to relate with George Floyd’s death. Relate with his fear. For more than 8 minutes, he went through the fear of an imagined, then imminent, death. His own death. I have this recurring and entirely irrational fear of (for some unknown reason) being bagged by the mafia, tied to cement blocks, and tossed into the ocean. I’m not that scared of being dead, strangely enough. I’m scared of the 90 seconds of hopeless struggle I will inevitably put myself through just to realize over and over again that I am going to die, and that there’s nothing I or anyone can do about it. I don’t know what particular thing calls up that fear for you, maybe it’s being buried alive, maybe it’s getting a terminal disease, maybe it’s having your car flipped over on a remote highway and slowly bleeding to death. Whatever that is, close your eyes and call it up into your imagination and live it over and over again, for a minute, two minutes, five minutes. Imagine all the things you wish you could have said or unsaid to the people in your lives, then realize that it is, in fact, too late.

That’s the fear of hoping against the inevitable. It’s pretty unpleasant, right?

Fortunately, for most of us, that fear dissipates the moment we open our eyes. But that is not the case for some of us. That fear, for 8 minutes, was George Floyd’s fear. It was Eric Garner’s fear. That IS the fear of black men young and old and it’s a goddamn travesty in this fucking country that those fears are actualized, day in and day out, and caught on camera, with nary a consequence.

More concrete is the fear of the death of a loved one, whether imagined or sometimes, unfortunately, unavoidable. These people - someone’s loved one at your age, your brother’s age, or your father or your son’s age - are going to potentially find themselves fighting for their lives after being accused of using a fake $20 bill, selling loose cigarettes, or, you know, dancing to the music while going to the shop to get ice tea. In turn, their parents, their siblings, and their friends and coworkers have to face the fear that their loved ones might never make it home on an otherwise perfectly unremarkable day, every single day. Feel this for a moment: you get a call from an unknown number informing you of your worst nightmare, not because your boy did a dumb boy thing that you’ve told him millions of times not to do, like backflipping into a pool landing head first, but because the police, well, killed him in broad daylight. The color of his skin is not the official cause of his death, but we know that it might have saved him had it been a different one. We know that.

And so the first principle is this: not a single person, nor their families, should be subjected to the realization of these otherwise absurd fears for simply living their lives, the lives you and I live day in and day out, the lives that we now witness abruptly ending on what feels like a weekly basis. This is an axiom of life and a fundamental human right in a so-called “first world” country. We can discuss solutions and trade-offs, and we should, but if this is not a principle you are ready to defend at all cost, for your own life and for the lives of people who share a world with you, then there’s no point reading further.

Fear II. Power

I don’t think all cops are bad. Like all professions, most people in it, by definition, are average, and there are those that are really good at their job, and there are those that are really bad. But being a police officer, like being a healthcare professional or any other emergency personnel, is not a regular job. There is an inherent asymmetry when it comes to life because time is asymmetric: there are many opportunities to save and prolong a life, and every successful attempt buys you a new opportunity, but there is only one opportunity to end a life if you so choose to take it, either by accident or with malice.

As a result, while the person whose life you’ve saved may be eternally grateful, you’ve only extended the natural course of things, and because it is your job, you did what you were supposed to do. The only reward comes from knowing how many lives you’ve touched, and if you did it for long enough, maybe a medal. Coupled with the fact that, in a stable society (though that’s looking less and less applicable to America), people rarely come across dramatic life-or-death situations that require forceful intervention, suddenly the taking of lives becomes a lot more obvious - and easy - than their prolongation. I think this is important to keep in mind.

I’m biased, though. I’d grown up entertaining the notion of becoming a police detective, because I loved Sherlock Holmes and Detective Conan. Even though Sherlock Holmes was actively antagonistic towards the police constables, short of opening my own private agency, becoming a police officer seemed like the only real option for solving mysteries for the good of the people. Even if you took away the magnifying glass and cool CIS gear, at the core of its idea, the police is a gentle but just force: it’s who you call to get a kitten off of a tree (I didn’t watch a lot of shows about firefighters) and to shoot a villain dead. I never considered it seriously, though, partly because my mom would kill me first, but mostly because I thought I didn’t have the nerves for it: sometimes, I (still) scream like a little girl. If you gave me a loaded gun, I honestly don’t know if I’d make the right call time after time, when it comes down to a perceived threat to my own life vs. giving someone a chance to change theirs. Such is the burden someone carries in a profession universally hailed as heroic: to routinely put your own being at risk in order to serve, help, and save others, even if you have the power of the law backing your gun.

After we moved to Canada, I spent my teenage years listening to Tupac and Jay-Z talk about crooked cops, and through my involvement with various organizations, and my own (very minor) run-ins with the law, it dawned on me that 1) any individual police officer does not always try to do good, and 2) the law affects me - a scrawny Asian teenager - a lot more differently than it does people who appeared to be different, even though I did my very best to wear as many oversized and hooded clothing items as I could, because that’s the culture I grew up in and wanted so desperately to be a part of. These last 6 years in the U.S., though, marshaled a whole new level of degeneration in my idea of what the police represents. Eric Garner’s murder made an impression on me, and that moment shifted my perception of what reality in the vaunted United States of America - especially for poor and black people - really looks like, and it’s only getting worse. I had wanted to write something then, and I’d wanted to write something after every murder: Alfred Olango, Philando Castile, the list goes on and on and I’m ashamed that I don’t even remember all the names anymore. I could never find the right words, because it was senseless - it literally didn’t make sense - but somehow it always seemed like a complex situation. No matter, these are the words I could muster up today.

Derek Chauvin, the man who killed George Floyd, is without question a murderer, and his colleagues accomplices. This is as plain as the day from which we see the video recording: he was not acting in self-defense, it was not an execution ordered by the law, and it was certainly not an accident, as those 8 minutes slowly unraveled and the dying man called out for his dead mother. If you change his uniform for any other colored shirt, you will easily see the situation for what it was: murder. However, the uniform and the badge are precisely the things that complicate, not the reality of the situation, but how people can choose to see it. I get it. People want to side with the law and believe that the police officers did nothing wrong because they must have had a reason to do what they did. They must have, right? These petty criminal men, even if they didn’t deserve death, should not have resisted arrest, right (even though we see again and again that cooperation does not mean survival)? Maybe this benefit of the doubt rightfully applies to most police officers, in most situations, but make no mistake, it does not apply here. Because of the uniform and badge, Derek Chauvin was not forcefully stopped nor lawfully apprehended on scene for killing another man. Because of the uniform and badge, Daniel Pantaleo, the officer who choked Eric Garner to death in 2014, was apparently fired in 2019, 5 years later, with no criminal indictment. Such is the power granted by the uniform and badge.

Like I said, I don’t believe that all or even most police officers are “bad”, whatever that means, and I certainly don’t have the expertise in police reforms or community protection policies to offer concrete and informed thoughts. Some of the conversations that have recently come to the forefront of our attention about defunding major metropolitan policy forces, which can take up a significant proportion of the city’s funding and upwards of billions of dollars, seem hopeful and innovative. So are the conversations about police reform, investing in developing local communities, healthcare, and education, and god forbid, de-weaponizing your local traffic cop. Fuck if I know what the best thing to do is, and my ignorance on these topics is not to imply that these conversations have not been happening for a long time, but simply to say that I write from how I feel, and how I feel is this: the uniform and badge were granted by the people to defenders of justice so that they can wear it with pride when they repeatedly put their lives at risk, when it becomes all but necessary to use deadly force to diffuse a situation (which, coincidentally, happens a lot more often in America. I wonder why). Drawing your gun at the first available opportunity is not that, choking an unarmed and defenseless man is not that. You have turned your proud shield into an ugly and small mask, behind which you wield your powers while hiding your fear of all that threatens, not your personal safety, but your ego and your perceived way of life - an asymmetric and unfounded fear that gives little back while taking the whole entire lives of others.

Post-script: there is so much more to say about larger structural and systematic issues within American law enforcement, i.e., how police forces are trained and militarized that turn good-doing people into…less good ones. Again, I’m obviously not an expert, and tons of conversations I don’t know much about are happening, but here are some police stats and a personal statement from a former cop - it’s a good read, even if anecdotal (thanks, J). A quote:

The question is this: did I need a gun and sweeping police powers to help the average person on the average night? The answer is no. When I was doing my best work as a cop, I was doing mediocre work as a therapist or a social worker. My good deeds were listening to people failed by the system and trying to unite them with any crumbs of resources the structure was currently denying them.

Fear III. White Supremacy and the “Normal” Way of Life

Boy, that escalated quickly.

People continue to confuse their wish of a meritocratic society for the delusion that that is the world we live in today. Often, we wish for something so badly that we believe it to be true. That’s why “black lives matter” is a poor slogan. It is a statement of a platonic ideal, not a statement of reality. Really, what we want to say is “black lives should matter, but they don’t, so please, let’s do something about it, because we want black lives to matter.” To the people that suffer from the reality that black lives don’t matter, it doesn’t make a difference, because no black man is going to confuse the platonic ideal and the reality today when their life can end because they got pulled over for a minor traffic non-offense. But for a person who doesn’t know the difference, and who thinks the world they live in is a just and fair world, either by blissful or willful ignorance, “black lives matter” sounds like a chant for exceptionalism (i.e., “black lives better”…how do people come up with this shit). Unfortunately, the exceptionalism is not in what we want for black lives, but in how it currently is, and not in a good way.

On the other hand, “all lives matter” is a curious response. I don’t disagree at all, all lives do matter, of which black lives are a subset. Saying “all lives matter” in response to “black lives matter” contributes literally no semantic information to the conversation when taken as statements - it’s all true, we agree. But that ignores the context of this conversation, and if I’ve learned anything during my PhD, it’s that context matters tremendously in language and communication. So what is it really saying?

Here’s my analysis: I suggest that we treat “all lives matter” from the same perspective, not as a statement of a platonic ideal, but a request, and in this case, a counter-request: “other lives should matter too” and “my life should matter too”. In most cases, this means “white lives matter too”, though my fellow Asians are not exempt from this (which always boggles my mind, more on this later). More specifically, the word “life” has a different meaning here: all lives are not literally at risk the same way black lives are, but people fear that these protests and reforms and affirmative action policies will threaten, not their physical lives, but their way of life - a life of normal. People fear that all this pot-stirring means that a spot at Harvard will be given to an undeserving colored life, rather than a hardworking and law-abiding, and most importantly, uncolored life, a life that has done all that they can to follow the rules of this just and meritocratic society, because that’s the way it should be.

Except, that’s never been the way it is.

One unforeseen but much appreciated consequence of our reaction to yet another senseless police-killing of a black man is that it has pushed every profession to do some introspection on its biases towards black and other-colored people. And hopefully, also minorities of different kinds, be it gender, race, sexual orientation, or disabilities, and to appreciate their unique contributions. Some of that, no doubt, is lip service. But as a friend wisely (and sadly) put it, at this point, even lip service and pretend-wokeness is better than staying silent and complicit, because at least it keeps the fire going. What has come up over and over again during this process is that, well, black and colored lives and their contributions don’t matter…even at the fucking Bon Appetite Test Kitchen?? Goddamn it, my one source of warm reprieve.

Academia, though, is especially and unsurprisingly egregious. Someone fact-check this source, but there was a tweet going around (from unpublished but formal research) showing the proportion of faculties at various R1 university psychology and neuroscience departments, and one of them literally had 0.

Zero.

What’s more surprising is the consistent denial that this is a fact and a problem from people of all categories within academia. If you grew up a white straight man, possibly even upper middle-class, to become an academic - sure, I can see where that perception comes from, and I don’t blame or even dislike you: as far as you’ve seen, the world is fair and meritocratic, and if I were you, I’d want to keep it the same way too. As I alluded to above, however, what’s hilarious and perplexing to me is how there are any people of literally any other labels, and their intersections and inter-intersection, can maintain that this whole enterprise is “fair” and working for them, because every single one of them, of us, is getting fucked. The answer is not yes or no, the answer is “by how much?” Women? Check. Gay? Double check. Poor? Check, unless….you’re not Matt Damon and/or a white janitorial staff that can do math, are you? Black? Fam, CHECK. Gay, black, and poor? I’m sorry sir, are you lost? This is not a Wendy’s.

Obviously biases are not specific to academia. If anything, academia is perhaps progressive enough - compared to law, finance, and medicine - to openly acknowledge these problems. But I do find it curious that academia has not solved these problems together, working from the foundation that all perspectives are valuable, from purely a perspective of bringing in cultural insight. My hypothesis is that scientists, more than any other professions, dedicate their lives to finding order in this world with limited available data - finding order often prioritized over getting more data. In this case, it appeals to our inner peace to believe that there is order and justice in this ascetic profession, where we are devoted to nothing but the truth. And when we want to see it, we do, because then we can focus on science, not politics. Politics is uncomfortable, messy, and frivolous, not for your pure analytical scientist-type to touch (or maybe I’m projecting). But just because we do not see how a compounding of systematic frictions work in minute and invisible ways against our colleagues, and often even against ourselves, it does not mean it is not happening. Everywhere there is people, there is politics. My parents moved from China to Canada because they thought North America would have less covert and insidious politics for me to deal with, and I thank them for sacrificing their lives back home to do it. But they were only partly right: there is covert and insidious politics everywhere, America included, it’s just that minorities usually don’t even have the privilege of participating in it, because you can be spotted as an outsider from miles away - black, yellow, or brown - even before your funny mannerisms and your funny accent and your funny-smelling lunch had a chance to give you away.

Or maybe old white dudes just don’t get it because they didn’t have to deal with all this racial diversity non-sense back in the old days. Equally plausible.

I might sound bitter, or that I feel venom towards certain people, or even that I’m pushing aside the voices of the rarely-represented colored scientists to air my own grievances. Those are not my intentions, nor am I bitter. I simply write what I believe to be truth, truth that I have accepted with no harsh feelings. Being a straight Asian man in academia, by all accounts, is as natural of a home as I will ever find (though that comes with its own problems). But growing up in this society as a minority, you must first acknowledge the truth that you are different from normal, and you must certainly not expect to walk into someone else’s home and be accepted and crowned king right away, or ever. That is delusional. These are the things I will tell my children, and my grandchildren, and I will be grateful of my privilege that I do not have to fear for my life or the life of my family, nor will I fall victim to the fears of a police officer or vigilante gunman as I take my long evening walks into nice neighborhoods. I do not have to tell my kids to put their hands where officers can see them when they are pulled over, or to make no sudden movements under any circumstances, ever, in the presence of law enforcement. Actually, no, that is not a privilege, but a right, and a right some of us do not currently possess.

As for the changing way of life: I, too, fear sometimes that what could have been mine is given to someone else on the basis of my and their color, gender, or whatever attribute the institution wants to put on a poster on a given day. That is a natural fear, and if you fear the same, you shouldn’t feel shame or run from it. It is a competitive environment, and we all want the best for ourselves. If I had inner club privileges, I’m not sure I’d be so quick to give that up either. That’s why I don’t feel any animosity towards any individual that makes up and unknowingly participate in this institution, white or not, because they were born into it. Many of my best friends are white (…and that sentence will never not sound stupid, even if true). But if you are a sliver of the minority pie - not only in race and color, but in every other attribute - that’s not empowered by this system, then you will empathize with the struggles of your friends and colleagues, not in degree, but in kind. You will know what it’s like to be in a place where you’re not expected to be. And if you can let those fears pass through you - after all, they are rarely a matter of life and death - you will see that this whole place is full of people that’s not suppose to be here - people that differ, are not normal - and that we can unite in our differences, and in our fears, to make it a better place for everybody.

Post-script 2: that ending got a little dramatic there, but it felt right. Sue me. I guess what I really want to say is: for as long as I’m in this thing, I offer my help and potentially poor advice to anyone - no matter what color or shape, but especially those underprivileged - wishing to pursue their goals in brain and cognitive sciences, freely.

Fear and perfectionism in creating, and a horse is a hand. [9/52]

2020-05-12T00:00:00+00:00

You’d think that with all this free time I stumbled upon during the quarantine, I’d be churning out blog posts every other day, especially having committed myself (in writing, no less) to doing one every week this year. I could probably come up with all sorts of reasons for why it’s not happening and for why I haven’t posted anything in almost two months, like the stress and anxiety from being stuck at home during covid, the general state the world is in right now, etc. That might have been true for the first few weeks, where life was just in complete disarray. It still kind of is now, but I’ve definitely adjusted, and it’s not all that bad for me personally, or maybe I just gave up hope. Also, I’m in the paper-writing phase of a project, and maybe I’m fresh out of words? While I think these are all true to some extent, I don’t think they are the main reason for this hiatus. I gave myself a pass for the first few weeks, but I’ve been getting progressively more annoyed, because I’m still writing things, just not finishing and posting concrete posts. So I sat down this weekend to think about why this might be happening.

I think I’m afraid: I’m afraid that the next thing I write and throw out there into the void of the internet will not be as good as the last one, or that I will say something stupid and offensive unintentionally, or simply that nobody will care and it won’t get as many likes on social media, and that I’ve peaked at 100 retweets on a dumb conference summary.

Fear manifests itself in many ways. If you were to ask me, “are you scared of writing a blog post because it won’t be as good or people won’t like it as much?” I would’ve said hell no, because I write for myself first and foremost. Well, that’s the intention anyway. But also I mean I clearly write stupid shit on the internet all the time so that can’t be bothering me.

I think how the fear manifests right now is through a prohibitive perfectionism. I think about covering all potential branches of an idea even writing a single word because I want to be complete, to prepare for every conceivable possibility. It’s not because I think somebody will attack my argument on how toilet paper should have two different sides, or that we are all yogurt. But I want to feel that it’s good enough because I’ve demonstrated that I’ve thought this out thoroughly, and that I’m critical and clever in my coverage of the topic at hand, whatever trivial thing that may be. Now that I’m writing this, it sounds completely absurd.

I’m actually having a pretty interesting experience right now, bouncing back and forth between just letting the pen flow across the page, to stopping and thinking in my head to put together a sensible sequitur and a well-crafted sentence, which completely takes me out of the flow of writing. This latter mode is how I’m writing my paper though, with a lot of thought put into each sentence to convey maximum information with maximum clarity and precision. But for a blog post I’m hosting on my own website, there’s really no need for this kind of internal scrutiny. In fact, why I’ve set this up and why I’ve committed to writing regularly was precisely to have a space to be as imperfect as I need or want to be.

But creating something is already so hard, just from a technical point of view. It’s made harder when the imaginary critic is actively involved in this process, and in this case, the critic is the internalized and perceived reactions of others reading this. I think I’d said this in the first post of this year even, that a key skill I want to practice is to actively separate the process of writing and editing. More generally, to separate the process of creation and criticism—constructive or otherwise. But knowing in principle and doing in practice is very different, and fear takes over and freezes you in thoughtland readily.

Perfection and fear.

People say that perfection is the enemy of good, or of done. That’s only partially true. Perfection, and perfectionism, is not the enemy of good inherently. Striving for perfection is the only way to be good and to get better, kind of by definition. But because perfect doesn’t exist, hoping to stop only until you reach perfect is a fool’s errand. That much is true, but I don’t know how often I am, or if anyone is, actually setting out for perfection.

Fear, and the fear of wanting and working hard to try to be good but failing, on the other hand, is just as great of an enemy of good, of even starting, as perfection is.

A friend remarked a couple of months ago on how confident I appear in my writing. I thought that was interesting, and puzzling, because I don’t feel very confident very often, especially in anything that involves interacting with others as a human being, whether in person or through writing. She said that the fact that I’m able to put something out there that aims to be informative and potentially educational is a sign of confidence. Looking at the pieces of writing I had the most fun producing - in that it was effortlessly written - it’s not that I was the most confident in what I had to say, it’s that there was another emotion that overcame fear, and that I didn’t think about being confident or otherwise at all. I just didn’t think, period. Sometimes it’s the excitement over something I find really cool and that I need to share through writing. Sometimes it’s overwhelming emotion of joy or sadness. Most of the time it’s a need to sort through an idea or argument in my own head to settle the anxiety of confusion. But also very often it’s an outpouring of words from utter frustration and disbelief over how something in the world is, or how someone could possibly think that the field potential is the exhaust fumes of cortex. That last one, I learned, is called a diatribe. Those times, I don’t even edit much. I put the words down and forget about it.

On the other hand, when there’s no emotional drive to express, then fear overtakes the need to create, even though week after week I put “blogging” on my weekly to-do, even starting drafts for many ideas I have. I don’t know why this is inhibiting me in particular right now, or if it’s even more prominent right now or I’m just realizing it. Maybe it’s covid-related, that I don’t feel so strongly about anything in life in particular right now because every day is the same. Or maybe my paper-writing a the process of criticism there in a tight feedback cycle is bleeding over. Who knows. What I do know is that perfecting the craft of creating cannot happen if no words ever make it out onto the page.

I’m writing as if I’ve peaked and won a Grammy, but all this is is some random blog post on the internet. I can’t imagine the anxiety of producing actual art that people can only judge through whether and how much they like it. Maybe the process of overcoming fear and actively finding things in life that inspire or trigger me will reengage my feelings during this pandemic and make theses groundhog days a little less flat. As always, there’s a lot more I want to say here, for completeness’ sake, especially when it comes to the practice of writing. I’d always thought that writing is an intellectual process, because it involves putting verbal thoughts onto paper, literally just transcribing what I think, and that I’m horrible at truly creative expression like painting or even other forms of writing, like poetry. But maybe it’s not so different after all. As much as “thinking” is required, it needs to happen before and after writing, and interferes with the process of creating itself, especially if I’m trying to sound thoughtful or clever or “complete”, or whatever else I want to convey. Obviously the extent to which one should be bothered to edit is different for a blog post vs. a scientific article on which one’s career depends on, or even a tutorial that I’m writing with the explicit intent of teaching somebody something. But here, this space is truly a free one for me to mess around and still have the opportunity for interesting feedback and interaction with people.

A horse is a hand.

Just so you don’t leave here empty-handed after my self-therapy, here’s something I learned recently that is fantastically absurd: two weeks ago, I got into one of those late night conversations about completely random things with Matt and Zimu, and for whatever reason, I was told to watch this YouTube video of this guy horseshoeing his horse. Now, the place I’m from is surrounded by pastures and plains, but I grew up a city boy entirely, so I don’t know if this is common knowledge. I had never really thought about this too deeply either, but I’d always felt that putting horseshoes on a horse was, maybe, a minor act of animal cruelty? Because you’re nailing this heavy metal thing to the bottom of a horse’s foot, and that can’t be pleasant. But I figured they just did it once in the horse’s life, maybe after the horse reaches adulthood, and that the protection the horseshoes offered for the horse was worth the momentary pain. It was kind of stupid, but not that far from the realm of possibility considering how people interact with animals.

So you can imagine my mind being blown in real time when the guy started shaving stuff off of the horse’s hoof. Some discussion between us ensued, and as it turns out, I’m an idiot. The horseshoe does not, in fact, attach to the bottom of the horse’s naked foot, which I thought was what a hoof was. The hoof is actually a thick chunk of keratin, which is what fingernails are made of, and grows much like fingernails do. That was the point of this horseshoe video in the first place, that somebody needs to periodically re-shoe the horse and clean up the hoof. Now the whole “Jello is made of horse hooves” thing makes so much more sense. So then the gears start spinning in my head: if there’s a fingernail at the end of each leg on a horse, does that mean a horse’s leg is actually a finger, and that ONE horse is ONE HAND with 4 fingers? A horse is a hand???

So I looked this up today, and it’s like, not that far from the truth, but that’s also not even the wildest thing. Apparently people have known that the end of a horse’s leg is, basically, a finger, and it was thought that a horse has one finger on each leg. Horse hoof on Wikipedia says this:

A horse hoof is a structure surrounding the distal phalanx of the 3rd digit (digit III of the basic pentadactyl limb of vertebrates, evolved into a single weight-bearing digit in equids) of each of the four limbs of Equus species, which is covered by complex soft tissue and keratinised (cornified) structures.

I read this a few times and I can’t wrap my fucking head around it because I just couldn’t parse whatever was inside the parentheses: why does it say the 3rd digit if each leg only has one digit? What on Earth is an equid??? After some more poking around, I learned that the conventional wisdom is that ancestral horses had 5 fingers, you know, as one does. But over time, they evolved such that only the middle finger remained. I mean “middle finger” only makes sense when it’s contextualized by the fact that horses had 5 fingers, but I certainly have a newfound appreciation for horses now that I know the default state of a horse is that it’s flipping you off from all 4 limbs.

But maybe they really are flipping you off with the middle of their 5 fingers, because even wilder is that a paper from as recent as 2 years ago argued that all 5 digits are, in fact, still present on the horse’s leg. It’s not visible by eye, but the vasculature and nerve-endings support the fact that all 5 fingers are very much there. Imagine how much someone hates you if one day they decided that, fuck it, I will now do everything with only my middle fingers, including walking on it, so I can be flipping you off at all times.

Now, if you’re a centaur…

Climate of Uncertainty: COVID-19 & the Unquantifiable [8/52]

2020-03-19T00:00:00+00:00

Late into the evening of March 9th, I bought my plane ticket to Munich for March 30th, for a 3-months research visit. I was going to learn some dope Bayesian inference tools in the Macke lab for modeling neural circuits, and also to take a breather from my own work before I defended my PhD at the end of the summer. 3 weeks before I was to fly out is probably the earliest I’ve gotten a plane ticket in recent memory. Well, serves me right to try to do things early.

This is not how I thought it would end, not at all. No, I’m not talking about the end of the world, not yet at least. I’m talking about the end of my PhD.

The next morning, March 10th, I wake up to the news that 1) UCSD is set to move all classes online in Spring quarter and the last week of Winter quarter is all but cancelled, and 2) the entire country of Italy is now on state-declared lockdown. It’s March 19th today. These last 10 days in between have felt like two months, with bad news coming one after another, one drastic policy announcement after another. UCSD campus is now fully shut down, all sizeable gatherings (including PhD defenses) are to be conducted online, both the US and Europe are about to/ have already restricted travels for incoming non-citizens, and it’s somewhat unclear if I could even come back to San Diego if I were to go back home to Toronto. I went from having a flight and looking for flats to definitely not leaving to the world melting down, all in a span of 5 days, then spent the next 5 days depressed.

To be fair, I should have seen this coming, at least enough to not be lured by the ridiculously cheap plane ticket, because things were already serious across the world by early March. Even though that was fully my own stupidity and ignorance, I think it really serves to highlight how suddenly things evolved and then came to the forefront of my consciousness, as if my awareness of the situation itself also grew exponentially. I swear if I see another “exponential growth” plot on Twitter, I’d throw up exponentially and never plot anything in semi-log again.

So why am I writing this? Why am I occupying my consciousness (and yours) if I’m already filled to the brim with grim projections and never-ending bad news? Well, in part, it’s therapeutic for me, because I’ve thought so much about it and it feels weird to be writing about anything other than this situation at the moment, that I need this brain dump. I’ve seen a lot of numbers and forecasts on Twitter and through various news outlets over the last 10 days. Obsessively so. I’ve also seen some tweets and had some real life conversations about how it’s been tough for some people to deal with this climate of uncertainty, beyond a concern for those friends and family that are at higher risk. Those real-life feelings are discussed much less often, probably because they don’t fit nicely into a tweet, but it always makes me feel a little better to know that my crazy anxiety about the world isn’t just me being crazy. I usually don’t get too worked up over these kinds of things, though I can’t say I’ve lived through a bonafide pandemic in my 29 years of life. In any case, I figured I’d write about how shitty my last week has been, and some other scattered thoughts about the coronavirus here and there, so that maybe some of you will relate and find it cathartic as well. Do listen to the song at the end though, it’s topical.

TL;DR: if you’re having anxiety over some decision about where to be, or just life decisions in general in this time of extreme uncertainty, I’m with you. Some vague advice: consider your most important priorities and optimize for just a few of those, because things will move too quickly for the unimportant stuff. And get off of Twitter.

Doomsday diary

The weird thing is that not once in the last 10 days was I concerned about my own health during this pandemic. Literally zero. Maybe I should be since there were obviously some young and spry adults around my age who did unfortunately die from serious complications of the coronavirus. But for the most part, I have no co-morbidities and the statistics backed up my newfound sense of invincibility: were I to catch the virus, I’d probably get flu-like symptoms, stay home for two weeks in bed, and walk away with an immunity (don’t worry, I’m still practicing all the personal safety recommendations). You’d think this sense of invincibility would allow me to carry out my life unaltered, but it’s times like these that make you realize how embedded one is in this global infrastructure of people, things, and places, and I was just really affected by the logistics in a time of extreme uncertainty, while worrying about being a local and global transmission vector, and then dealing with a feeling of total lack of control over my own life.

I was so taken back by the progression of events last week that I started writing down all my thoughts and observations on March 10th, sort of like a joking-but-also-not diary of the apocalypse, so that if somebody 200 years later found this random dude’s recounting of how everything went down, maybe they’d make a movie about it. I won’t copy over the full thing, but for the first two days (March 9 & 10), it was more or less just me marveling at how suddenly responses ramped up in Europe and North America over maybe a 3-day span, how I knew about all of this since the beginning of January when news about lockdown in China broke out, and how I completely failed to take this seriously until it reached my front door. When I started writing these thoughts down, I was in such disbelief that I thought I had missed something in the global sequence of events, so I looked up coronavirus timelines (this one in the NY Times), and I guess, unsurprisingly, I had heard about all of them between Twitter and my morning news. Thinking back, it was kind of insane that I was simultaneously hearing about a state-mandated lockdown of some tens of major cities in China and also going about my own day as if nothing was happening in the world. I remember hearing about those stranded cruise ships that had coronavirus patients onboard, and thinking that must suck major balls to be locked in a place knowing that you’ll almost certainly get this superflu from someone, then immediately going on to joke with a friend that if everything was to go to shits in the world, buying Prada or P&G stocks was probably a good idea because people will always want handbags and shampoo. That was early February, barely a month ago.

Actually, I think my own shift in awareness of how grim the situation was preceded many of my friends’ in California, and certainly the US as a whole, by about a day or two, only because my own future was so intimately linked to not what will happen in the US, but what has happened in Germany. Those couple of days sucked, because it seemed like I had to make a quick decision about whether to delay my trip, especially as I was apartment hunting in Munich. It was this strange and paradoxical feeling of anxiety that there was simultaneously not enough information to make a decision, and that too much information was coming in too quickly to even make a decision because there’s no point deciding to go if there was no Germany left (that, fortunately, has not happened yet). These last two weeks of March were already going to be really stressful as I was trying to finish up a project before I left, arranging whatever personal things there was to arrange (leave of absence, taxes, my apartment, etc.), and figuring out my travel plans and living situation over there. So it was like this weird paralysis of trying to run two sprints at the same time, but in different directions. In hindsight, it’s kinda funny, and the rational thing would’ve been to just wait to see what happens because even though the ideal outcome would’ve still been for me to go, and I can just finish whatever I need to finish over there, it was also totally fine if I were to stay in California, as I am doing now. Situations like this really make you rethink what is essential and important (more on this later).

I’m writing this not just to tell you about my sad life (it’s really not that bad in hindsight), but because I know a lot of my friends working or studying abroad are facing this exact dilemma right now, between going back home to their families to weather this storm and staying put where they are so they can be close to their regular day-to-day community. It sucks, and I don’t really have any good advice as far as what the optimal thing to do is, other than that making and committing to a decision made me feel better emotionally if for nothing but to make the anxiety disappear. Also, I was kind of lucky, if you can call it that, that the decision was essentially made for me a few days later. In the grand scheme of things, unless things go full nuclear (which is always a possibility though still unlikely at this point), it probably doesn’t matter that much as long as there is food, water, shelter, and some community around you wherever you are. I guess if I knew the world was going to end tomorrow, I’d probably try to be around my family in Toronto, but short of that, I’m pretty fortunate to have people that care and have reached out to me from all over. Be around those people if you can, talk to them if you cannot.

Geeky enough, sprinkled in my ramping anxiety was this fascination about how things - the virus itself, information, people’s emotional reactions, and governments’ policy responses - propagated spatially and temporally from Asian to Europe to North America, with a noticeable lag. Obviously, had Italy and other European countries seen and learned from what happened in China and South Korea, an early intervention could have prevented much of the disaster that ensued, which inevitably resulted in a country/bloc-wide lockdown. The same is true for the US, and it really doesn’t seem like we’re willing to pull out the big guns until it’s necessary, but with such a long latency between infection and symptoms (& death), when it starts to look necessary, it’s already set to run its course. It’s like an autonomous dynamical system - societies I mean - and people will do what people do, no matter what continent you’re on, even though what is required now is learning what somebody just before you had failed to do, and do that. What’s required is to be different, and in this case, to do things just a little earlier than everyone else. But anecdotally it seems that we as humans also have this inherent limitation in how far forward we can forecast, both individually and as a collective, otherwise climate change would’ve been solved.

Somehow it feels like Rudy Gobert’s fault

My capacity to entertain these interesting oddities went straight out the window by the evening of March 10th, when it was announced that there would be a travel ban between Europe and the US. Even though I found out later that it was only a restriction for non-citizens to enter the US, it was pretty much a moot point because 1) I wouldn’t have been able to come back, however unlikely it is that it would still be a shitshow 3 months later, and 2) surely there would be a ban in the other direction soon enough (which the EU is currently discussing). Oh and on top of that, the same night, the NBA (and subsequently all major leagues) announced that the the season would be suspended indefinitely. That makes total sense, but somehow hearing about Rudy Gobert being a complete fucking idiot touching everything on the podium before or after he got sick was just the perfect shit icing on this turd cake. I don’t know why, but he being the first NBA player (and more or less the first American celebrity while actually being French) to have acquired and subsequently went on to give it to Donovan Mitchell is entirely unsurprising to me. My lab was concerned:

The memes though, those did brighten my day (all joking aside, obviously I hope the players go through a speedy recovery).

Anyway, this is what I wrote on the morning of March 11, it’s much more punchy than what I can summon up now so I will copy it over in full:

“We are literally in a global pandemic. Pretty neat. Also, US has (maybe) banned all travel from (and to?) Europe, so there goes my Germany trip. I hope they route me the travel grant money still. You know, I’ve sort of fantasized about a doomsday scenario like this many, many times now, like an apocalypse type of situation, where we’d basically have to hunt for our own food. But more realistically, I was thinking that it’s kind of funny how I’m in the middle of a personal rollercoaster, with the uncertainty of travels and what to come after my PhD, but now there’s this literal pandemic that’s throwing a wrench into things, to say the very least, so in the grand scheme of things, my own turbulent future doesn’t seem all that unstable now, at least compared to what could very well happen in the next 10 days. And all of THAT seems wholly unimportant compared to what it must feel like for a friend to be carrying, due sometime after she becomes unemployed and uninsured, with or without the coronavirus happening. That’s a whole new level of uncertainty. The kid could be born into a completely different world.”

And then, on the morning of March 12:

“It’s 10am and dark outside because of the rain. It’s my last day at the Writing Hub, the bagel meeting has taken on a darker tone, like a pre-apocalypse type of vibe. The hub has mandated remote consultations and the team doesn’t really know what to make of it. I feel like this is the moment that we - I mean the world - collectively acknowledged, not the impending doom, but the impending uncertainty. We don’t really know what’s going to happen anymore. It’s funny, because we never really know what will happen in life within any reasonable prediction horizon, but this feels like a whole different thing. I’ve always felt like I bumbled through life without much explicit planning or foresight, but now, even more so. Twitter is being updated with ever more drastic news by the minute.

I woke up feeling down. Bummed. Maybe it’s the grey weather, or maybe the complete lack of stability is getting to me, and that I want to be able to be with family or someone I can hold onto during this time to know that at least some things will be okay and stay the same. It may very well be the case that all of this will blow over just fine, I will go on my Germany trip and do the work I want to do, and come back and defend all the same. That’s probably the most likely scenario, all things considered, but we’ll have to go through a lot more ups and downs to get to that point. But there’s always the possibility that it will not happen, because I don’t have a place yet and people may be averse to renting out, especially to a foreign person. Plus it’s not beyond the realm of possibility for European countries to close their borders. Even if I do get there, we are told to work remotely as much as possible anyway, and certainly not to travel for leisure when it’s not necessary. So maybe not going is just as likely. I’m sure I will have an update soon enough. It’s so interesting that we have to deal with all these new issues that nobody has ever dealt with before, and somehow a lot of people have opinions about it, while everyone still needs to go to work and do their regular jobs. My paper felt like a huge deal a month ago. Not so much anymore.”

Apparently I was still entertaining going on the trip. What a silly man. Anyway, a few days later, it became clear that the wisest thing to do would be to at least delay the trip. From this point, which was Thursday, I basically couldn’t bring myself to do anything for the next 5 days, because…well, everything sucks.

And then finally, on glorious Pi-day, Saturday, March 14, where we were actually suppose to demo something at the Science Center (which obviously got cancelled):

“I feel like I need to timestamp these on the hour now because things are happening so quickly. It’s hilarious when I think about how last Thursday (Mar 5) that there was a small rally at UCSD for grad student striking for COLA. 10 days later, we’re basically all fucked. I mean this is totally unrelated but I just think it’s funny.

I’m starting to realize that my mental health is deteriorating much faster than what the physical threat of covid poses, and a HUGE part of it is just being plugged into Twitter constantly to monitor what’s going on, and the floating uncertainty of everything. I suppose this is how people feel at a more local scale as well, leading to panic buying of stuff to have some sense of security. In the near to short term, I need to figure out a plan for Germany, limit my Twitter usage, and be grateful that I’m still in California around friends and colleagues who have in one way or another expressed that my staying is a great thing, as well as concern for my travels, including Brad who’s looking for ways to pay me on such a short notice, and also the friends and my parents who are not around but are reaching out to ask about my plans and expressing concern, all that is great.

It’s also interesting that I think I’m affected a bit more than your average person, from the global consequence, because of the planned trip. Like, not only am I affected by the status of the US node in this graph, but also the Germany and Europe node, as well as the edge between those nodes pertaining to travel restrictions. I’m also really thankful to have these friends in Germany that update me on what’s going on over there, and can advise me on what to do. Spoke to Leo on the phone today, had a much needed and hilarious conversation during this gloomy time. So for now, just gonna try to be rational, process my feelings, and carry on with my day.”*

I don’t know if I’m finding the right things “funny”.

At this point, the acceptance is settling in, and a pretty regular meta-experience I was having is that I would be engrossed in a fairly normal everyday activity - be it walking, reading, cooking, or having a conversation with a friend about some mundane affairs - and suddenly snapping out of that train of thought and realize that everything is not quite normal out there. I’m not sure what the best strategy is here, but I know being glued to my phone was not it. There was no need to read all of that all the time, because most of it do not inform any actual decision making, but rather cause anxiety and the perception of need to make a decision. But obviously being totally unplugged is probably not wise either. Only if there was some kind of communication channel that simply relayed statistics and policy announcements without any sensationalist agenda, but surely, the news can’t be it.

Settling into a new (and privileged) normal

So that’s how things progressed for me, and I’ve been stuck at home the last few days because that’s the new normal now, which is funny because it was also the old normal - I used to have weeks of just working from home because I was actually more focused here than being in lab. But the last few days were anything but productive, at least for work, because I had to spend a lot of time to just recalibrate, meditate, and process my feelings, and I think that’s super important if you have the time to spare. When I was in the mood to do stuff, there were enough administrative and logistics-related tasks that I could still do and feel like I at least did something in the day. But today was the first day that I was able to sit down and even entertain thinking about complex mental tasks like writing a paper and programming. I can’t imagine that I’m the only one, and from what it looks like on Twitter at least, I’m not. Hard to say just how many people feel this way though. Between the anxiety about the near future, the frustration of not being able to buy oatmeal to make my overnight oats, and the social isolation (or maybe in your case, intense social interaction with family members), I can’t imagine getting right back into a productive workflow, especially with the quarter coming to a close as well.

This period really made me understand a little better when I feel unmotivated to do something, because that also happened when life was normal (obviously), but I’ve always chalked that up to being tired or lazy or something. But I think having this overbearing weight of global anxiety, especially when it bombards you in all forms of virtual and IRL communication, really is a lot. So I guess I’m saying, cut yourself some slack. If you don’t feel like doing anything, and can afford to because that’s the nature of your job, then don’t for a while. Take the time and use this staycation for yourself if you could. I saw a few tweets going around stating that academia should be considerate of the fact that a lot of its junior members will have been less productive in 2020 because of the insanity, and that promotion decisions should take that into consideration. On the one hand, that’s fantastic and 100% true. On the other hand, I can’t believe that that is the thing that so quickly surfaces to the consciousness of many amidst a global pandemic, and probably out of necessity. Thinking about it, if I were to be in a situation where somebody could not empathize with the fact that labs were literally closed and people are collectively going through a tremendously difficult time, I’d tell them to fuck right off no matter what golden ticket of a job it is.

Unfortunately, for academia, that does not seem like an unrealistic scenario. Over the last few weeks actually, in combination with some other unrelated reasons like the UC-wide fight for cost of living adjustments and learning more about health insurance as a graduate student “worker”, it really made me - and there’s no other way to describe this - fucking disgusted about how little the institution cares about its undergraduate and graduate students once money is involved, the latter of whom are the people that prop up a significant portion of this enterprise in the first place but get treated like replaceable figurines in a generic Lego set. My experience is with UC and UCSD specifically, but I can’t imagine other American public or private institutions to be much better (my department’s cool though). And I’m not really sure who I’m upset at when I say the “institution” - the President? the Chancellor? The academic senate? Who the hell even makes these decisions?

This email though, on March 13, from the Chancellor announcing the latest round of COVID updates, including the decision that all instructions are to be online in the Winter quarter, had this gem of a blurb in it that made me chuckle and then shake my head:

Like, full stop. Zero considerations of tuition changes as even a possibility, especially now that, a few days later, all campus amenities are closed. Am I just mad that the basketball courts are closed and I’m not getting a refund? Maybe. But seriously, what the hell are people paying for at this point?

Anyway, I can rag on the institution for a whole other blog post, but this is not the time or place for that. I guess just be aware of the fact that your undergrads and high school seniors are suppose to be having the best time of their lives this spring, and it’s nobody’s fault that they’re robbed of that, but they are. Which brings me to the next thing: this situation has given me a lot of perspective on how institutions, companies, and society at large treat segments of people based on their job, who and what are the most necessary when the structure we’ve set up seem most fragile, and how their experiences might be different during a time like this. The fact that I just wrote that my day to day is not that different from some days of my regular life, and that I’m waxing poetics to you here technically during paid work hours, is a tremendous privilege I have as someone who works mostly on the computational side of things, and being an information technology worker in academia in general (and also with a boss that doesn’t suck). A lot of people don’t have that, and are literally losing their jobs because people can’t go out to restaurants and shops anymore.

I feel like I’m learning about “how the economy works” in practice for the first time ever, because people not purchasing goods and services means that the people that provide those things do not have an income (wow galaxybrain, I know). Worse yet, there are a lot of people who would like to work from home because they are sick, or even preventatively, but cannot, and their company does not see a global pandemic as reason enough to pack things up, and honestly, I get it. Everyone from the CEO/owner to the entry level employee need cashflow to sustain a livelihood, though obviously some people can go for longer without an income than others. All of these issues do not surface and things seem to operate smoothly as designed when there’s no recession or pandemic, but when it does happen, it shines an ugly light on how fragile our economic infrastructure is. If food, shelter, and healthcare are basic human rights (are they?), then it would make sense for them to be provided independently from one’s ability to earn an income, especially at a time like this. Is that a society that we would like to work towards? I guess in this situation the government is suppose to use tax money to provide an economic and healthcare safety net for people, and the university could maybe spend some of its massive endowments that it keeps accruing and skimming off of R01s to foot the bill for some of its non-essential employees since it’s a little unreasonable to expect the cash-cow undergrads to be paying a few thousand dollars this quarter for 4 Coursera courses???

My god, do not get me started on the healthcare system. I can’t. I will just say, thank god for doctors, nurses, and every other essential services worker that are keeping this shit afloat. My garbage is still being collected, somebody still stocks, checks out, and bags my groceries, and my internet is still working, and I just pray that I don’t get so sick that I have to take advantage of the amazing healthcare system in this great land of freedom. As a poor person, I have never been so relieved to be Canadian at a time like this. Shoutout to the homeland, NBA Champs 2018-2020!

Probabilistic diagnostics

Some closing thoughts: what would be the ideal course of action in this scenario, in terms of dealing with the disease itself (nevermind the economic clusterfuck)? Obviously, it’s to develop a treatment as quickly as possible, but even then, you still have to know who has it and who doesn’t, unless we have a vaccine that works and just carpet bomb apply it. Short of a working medical intervention, having a perfect knowledge of who has the virus and who doesn’t will get you a long way, because the rest of society can essentially function as normal, while those that are affected can be quarantined and treated as necessary in an orderly fashion because case number stops increasing. This prevents further spreading, and we don’t even need contact tracing. Contact tracing is only necessary when you don’t have the capacity to diagnose medically, so you make an educated guess about who will likely have it.

Now, are we implementing widespread testing in an unbiased way? Of course the fuck not, this is America! Essentially, this introduces uncertainty into your estimate of an individual’s status. But let’s allow the test itself to have some rate of false positive/negative as well. In that situation, the ideal outcome (probabilistically) is to scale the amount of interactions a person is allowed to have with the confidence of their status estimate: if you’re pretty sure you have it, stay home. If we’re pretty sure - but not completely sure - that you don’t have it, then you can probably interact with your local community as long as you don’t inadvertently become a super-spreader. This is, of course, assuming that part of the cost function is still trying to optimize for some social interaction, and not to eliminate it altogether, for mental health and economic reasons, because prolonged quarantine is probably not sustainable for either. Given these premises, we can build a model of how the virus will spread after taking into consideration of the likelihood that an individual has it and how many people they will (or are allowed to) come in contact with, and simulate while tuning the parameters to optimize for both medical capacity and interactions to maintain social and economic functions. You can even account for how necessary it is for them to be in contact with other people, depending on their capacity to deal with isolation and how necessary it is for them to go to work.

So where are we in this situation? Given the fact that a person can be infectious while asymptomatic for somewhere between 2-14 days, we have fuck all clue about who has it based on visible symptoms alone unless you have been completely isolated for 2 weeks. So the uncertainty on those estimates is basically infinite. Or, unless you are deathly ill and end up at the hospital, get tested, in which case you probably already have it and are naturally quarantined by virtue of being in the hospital. The policy right now is for everyone not in the hospital to stay home, which (I think) is actually the optimal policy given the uncertainty, as long as the healthcare part of the cost function outweighs social interaction, and it is, but probably not for infinitely long. So if you have it (symptomatic) or have no idea, stay home; if you somehow knew you don’t have it (e.g., tested negative or having recovered from it), you should be able to move about freely, but with some upper bound on distance travelled and people seen, just in case; those are the only two possible current states right now.

If widespread testing is implemented, we effectively split up the first group, and move the high uncertainty group into either the known-positive or known-negative groups. But if there was some way to give a partial estimate of infection probability for an individual, like how many people you have come in contact with in the last 2 weeks, how many of those people are sick or have traveled a lot, any illness or symptoms you have (COVID or otherwise), and your current body temperature or whatever, you could then give a score of how many people you are then allowed to safely come in contact with. You can even make it into a phone app or something. An individual person obviously either has it or not, so you might be fine, or you might screw everyone else you come in contact with. But probabilistically and over the entire population, this could be a way to effectively flatten the curve while still allowing less drastic social-distancing measures to be in place. Furthermore, I think this will also give you an idea of how many people need to be tested for a given level of confidence in your estimate, so we can scale the preventative measures based on how certain our knowledge is, or to prioritize testing for those that are in essential roles that must come in contact with other people (healthcare workers, food preparation and delivery workers, etc., and NOT rich people wanting to travel). But then I could see people starting to demand to see your COVID-19 score before deciding to interact with you, with a premium on how low your probability is to be hired for essential services. This is starting to sound like an inadvertent mass surveillance nightmare, so I will just stop here.

Stay safe everybody!