Scott Wallsten: Jeff, thanks for joining me.
Jeffrey Macher: Scott, thanks for having me. I’m really excited to be here.
Scott Wallsten: Yeah, so you, along with a few co-authors, recently wrote a paper titled Generative AI as a Linguistic Equalizer in Global Science. So tell us about it, and kind of work backwards. Tell us what you found, and then how you went about testing your hypotheses.
Jeffrey Macher: Okay, great. I guess what we found that’s most interesting is this idea that scientific papers, scientific research in particular, are using generative AI. The overall uptake in generative AI is increasing over time. You find this across countries, especially in those countries that are more distant from English. Think the Middle East and Asia, and you also see it across scientific fields. Overall, the use of large language models as evidenced in research papers, is increasing, but mostly in those countries that are linguistically distant from an English-speaking country.
Scott Wallsten: So this is kind of a democratization effect of AI.
Jeffrey Macher: It is a democratization effect, and using it to actually write the papers. The other sort of big thing we find, because generative AI is being used, that the similarity in linguistics between non-U.S. papers and a corpus of U.S. papers is increasing over time. So what we do is we create, imagine in 2022, all of the U.S. pure scientific papers in chemistry, and compare it to a non-U.S. chemistry paper, you find that the linguistic similarity of those papers have increased, especially post-ChatGPT. This is also true across scientific fields, but it’s especially true for English-distant countries, as I mentioned, if the authors are all domestic, non-U.S., or non-English speaking, so if they’re all, for instance, from Japan or China or somewhere in the Middle East, if there’s no native English author on the paper, and then for the lowest impact journals, for those journals that are lower impact and high impact. So those are the main findings, that it’s being used much more than it was. Obviously, post-ChatGPT, it’s being used a lot more, but in particular, in certain countries, and it’s creating this, not only a democratization, but also this convergence and the similarity of the language that’s used.
Scott Wallsten: So let’s unpack this a little bit. One is the democratization effect, but you’re also identifying a ceiling effect, I think you called it, right? Where we don’t see this as much in the top-tier journals. What, what is it about these journals that’s keeping this effect from happening there?
Jeffrey Macher: We think that to get into those journals, the authors already have fairly good English language support, potentially editor services, such that the article is already going in pretty clean and pretty reviewed already from a referee’s standpoint, so they don’t really benefit from it as much. But for the other journals that are lower, there is probably less of an ability or less of a willingness to pay for those type of editorial services, and hence it’s being used a little bit more in that regard.
Scott Wallsten: That sounds like a very merit-based explanation. What are some other reasons that might, you might see this, if I can push that.
Jeffrey Macher: Well, you know, the, I think, there is editor and reviewer requirements that exist at the top-tier journals. I also think there’s a selection effect here. If you know you’re not gonna get, because English is not your first language, you know you’re probably facing an uphill battle in two ways. One, the content of your scientific contribution, but also the quality of the writing, you’re less likely to do it. And editors that I’ve talked to at some of the journals that are seeing a lot more AI, they have indicated it’s become a lot more difficult for them to desk reject a paper, because now, what could have been desk rejected just based upon the quality of the writing has largely, not largely, but somewhat been eliminated, and it’s now the ability to understand what the true research question is. What I’m seeing, just in some general discussions I’ve had with individual editors that I know, it’s more stuff is going out to reviewers than is being desk rejected, simply because it’s written better. And certainly in the journals that I’m targeting, which I’d argue would be the top tier, I think that’s the case.
Scott Wallsten: So, if we assume that there’s sort of a constant, that the space in journals is constant, although the number of new journals you hear about seems to be constant. But let’s just say that it’s constant. Something’s being crowded out, right? I mean, I guess there should be two things. One is that somebody’s losing, because they’re not getting in anymore, and the other is that we should expect, even at lower-tier journals, the average quality to be increasing. And I know that’s not what you tested.
Jeffrey Macher: We should. Certainly it’s something we try to discuss. Two things, Scott, that I think you’re entirely right. There’s a broader participation, or at least there should be, from generative AI. It levels the playing field, if you will, or at least expands a pool of ideas in which future research can build. Hopefully that would enrich scientific progress, and in fact, it might even counteract this so-called slowdown that we have in disruptive innovation, at least in the scientific fields that we’re looking at. So broader participation overall is a good thing, but what about this crowding out? If, in fact, there’s, the number of journals has stayed the same, but we know it’s increased a little bit over time, it suggests that some good ideas, or some ideas in the past from potentially an English-speaking author or author team could have gotten published, might face difficulties going forward. So there’s this intensification of competition, a more competitive playing field, if you will. Now, our hope is this will spur additional efforts to increase the quality and the originality of the scientific effort. That would be ideal, right? Whether or not that happens, it’s pure speculation on our part, but certainly it’s something that we hope for. What we see, or what we hope for, is the Bulgarian physicist that doesn’t write English well, but has a great idea, can now potentially get published in a way that puts them at a level playing field to their U.S. brethren. That’s the overall ideal, and then if that’s the case, does the traditional researcher that is published in those journals need to up his or her game? Overall, maybe that’s a good thing.
Scott Wallsten: I assume it could also work on the reviewer side. It could open up a new pool of reviewers, too.
Jeffrey Macher: Yeah, it could. Now, there’s risks there that we don’t look at, but I don’t know if you’ve heard about these, a lot of reviewers now are using ChatGPT to do their reviews, and one of the problems with that is it’s not really the creative element that you want. We know with large language models, they’re limited in their ability to be creative or provide creative answers, given the way in which they’re formulated. But yeah, it also could certainly allow for all of these now-published authors to become active and effective reviewers.
Scott Wallsten: So what do you think about, to go off-topic just a little, what do you think about referees using LLMs to either write or help with the reviews. I mean, I’ve gotten, sometimes reviews come back and they’re amazing, you know, so helpful. And sometimes they’re just absolutely terrible, and it’s not clear to me that ChatGPT will do a worse job than most referees.
Jeffrey Macher: It’s not, I know several of my colleagues who use this, because it is effective in providing certain ideas, and then they feel, through the vetting of what ChatGPT provides, they can write a better review. I know other journals that have said, no, this is not appropriate. I personally have not done this, but I see the incentive. Reviews take me on an average of a day to do, at least a whole day, and some people I’ve talked to get that down to 2 hours. So there’s this big incentive to save time, and, you know, if you’re right, if you don’t see a slip between the cup and lip, but between how the quality of a ChatGPT-generated review, or a ChatGPT-assisted review, and your own review, then why wouldn’t you? But I don’t see an editorial policy that has been put forth at most of the journals that are out there. I think there probably is an editorial policy, but it still is confusing, and there’s still a lot of individuals that will use ChatGPT anyways.
Scott Wallsten: One thing about AI we’ve seen is that, aside from the fact that every conversation has to be about it now, is how fast it progresses, right? You know, everyone wants it to be a compliment, but, already, I think it can write papers much better than many I read, written by people, and there’s, you know, no reason to believe that it won’t be better than people at writing the papers if, you know, if, you pose the question and let it answer it.
Jeffrey Macher: I agree on the writing, and ChatGPT is so good, you can say, write in a style of AER or write in a style of Management Science, and it’ll do that for you, and it is truly amazing. But the overall research question, the idea, I think, hopefully will still come from the individual researchers.
Scott Wallsten: Well the idea, yeah. I took all the data from my, from Apple Health, on my phone, I probably shouldn’t have done this, and put it into ChatGPT and asked to look at my calendar and everything else, on my agenda to see what my health outcomes correlated with. I’m not publishing this, it was just for my own interest, but that was a question, and I just asked it to answer it. Now, I didn’t go through to check to see if it was right, but, you know, as they get better, it seems like you should be able to do things like that, right?
Jeffrey Macher:I think that’s right. But hopefully, what I still believe, and you can, I’d be interested in your thoughts, ChatGPT still doesn’t have the creative element that exist between the ears of individual researchers. And will it ever? I’m not sure. I know of some studies that have looked at large language models that they cap at the 1930s, the 1950s, the 1970s, and they ask questions to predict from the 1930 model, something that’ll happen in 1940, and supposedly, from what I understand, I might be wrong, they don’t get many things right. And if that’s true, then there are some limits on ChatGPT that will still require individual researchers to play an important role.
Scott Wallsten: Yeah, I agree with that, at least up until now. They don’t seem particularly creative, and when you ask them to try to do creative things, they’re not very good.
Jeffrey Macher: Yeah, now, will that change? Who knows? But, as of right now, maybe.
Scott Wallsten: So there’s a new paper by a group at Berkeley and Cornell, where they looked at publications. They found that researchers that used AI published more, but publishing more, but researchers who used AI had lower quality by, and we can discuss whether those measures are all the right ones, but it sort of sounds kind of consistent with what you’re saying.
Jeffrey Macher: Yeah, so my other two co-authors on this paper actually look with another co-author from the University of Basel as to whether the effect is to increase the publications of the individual researcher, and they find something similar. So yeah, that is consistent. Since they’re on that paper, I think an interesting next question is, do you lead, does it generate better journal submissions, higher quality journal submissions? And they actually find that, too. But one point that I want to come back to is, just like AI is the wild, wild west right now for how companies are using it, so are researchers in what’s called the science of science or the economics of innovation. Everyone and their mother now is using AI for this type of natural experiment to look at what is the effect of ChatGPT on something that’s science-related, and we’re finding, and this paper’s under review now, but we’re finding at least another paper similar to ours every month. So there’s more and more of this. So you have to strike while the iron’s hot, because this can be used for a variety of different things. We just so happen to be looking at the researcher linguistic similarity in its use, but other great questions are out there, like you just identified. Does it lead to author productivity improvement, quality improvement, things of that nature with respect to publications?
Scott Wallsten: But are you saying, when you say that there’s a lot more papers, a lot more papers like it, that there’s more, copycat isn’t the right word, but lots of people trying to answer the same question?
Jeffrey Macher: Yeah, lots of people trying to answer, or at least a similar question. We know of several, we know of a couple papers that are similar to ours. Ours is pretty old.
Scott Wallsten: Yeah, in this world.
Jeffrey Macher: Yeah, in this world, but to be honest, there’s others that are either maybe taking a different scientific field approach, you know, related only to medicine. We’re very broad. We look at, you know, many Scopus categories, and we put them into four buckets, physical science, life science, engineering and technology, and social science. What we document, instead of being specific to a scientific field, that this is happening in every scientific field, in varying rates, but nevertheless, it’s occurring across all fields in, it’s a kind of a broad phenomenon.
Scott Wallsten: It kind of sounds like this is also potentially a part of a solution to the replication crisis, right? It’s if once, especially once somebody has had the idea and figured out how to do it, yeah, it’s easy to try to look at it under different circumstances.
Jeffrey Macher: It is pretty easy to look at it in the way in which we’re doing. And some, some, we try to be robust in what we look at, in what we vary, but yeah, you could, you can do deeper dives into how we generate whether something is a GenAI-assisted paper or not, whether or not author teams matter or not, the size of author teams, things of that nature. You could look at whether or not the keyword thresholds, I just mentioned that the keyword thresholds that we use are at a level that suggests it’s a GenAI-assisted paper versus not. Whether you need more than one AI keyword that’s typically in a paper or not, so this idea of stricter identifications. And then how you calculate your reference set, you know, specific to a scientific field, or just in general. And, you know, there’s this sort of other broader question is, should we be basing everything on English language? I personally think we should, because it is sort of the standard by which international scientific publications occur, but you could do sort of varying things there, and change, you know, what you want to look at. So there are many sort of things that researcher interested in GenAI’s effect on scientific publications can do. We took a very broad-brushed approach, and we try to show empirical robustness as to where it’s occurring, and how, and when. What I think is interesting is it’s all going to be over in a couple years, because if everyone adopts GenAI, well, it’s going to be, you’re not going to have non-GenAI-assisted paper with which to compare to. That’s why we think this is just the appropriate time to write this paper, and hopefully, knock on wood, get it published.
Scott Wallsten: Yeah, right. No, I mean, I think it definitely seems like the right time. We just posted a blog where we used Pangram, one of the AI detecting tools, to see how common AI is in regulatory filings, and the answer is not very, yet. But you didn’t have these tools. And so talk a little bit about how you detected AI.
Jeffrey Macher: Okay, yeah, so this is, you could argue this is a good way to do it, or you could poke holes in it, but what we do is we identify GenAI publications the following way. We look at the number of scientific articles from non-English-speaking country authors two years before ChatGPT releases and 2 years after. So we search what’s called the Scopus database. So the Scopus database is a database of publications, and we use 65 common GenAI keywords, think of these as markers in their title and their abstract, and these markers are words that have been associated via other researchers for detecting GenAI-generated text. It’s words like delve, groundbreak, intricate, meticulous, realm, revolution, showcase, underscore, unveil, elevate, valuable, crucial, and empower. And what we do in order to determine whether we think it’s GenAI or not is we look at whether or not those keywords have increased from 2021 to 2024 by more than 300%, so a four-fold increase, and that’s our base, and then we vary that to determine…
Scott Wallsten: It’s pretty conservative, too. I mean, in the sense of giving people the benefit of the doubt that they didn’t use it.
Jeffrey Macher: Yeah. So we look all the way up to a five-fold increase, and we don’t find much difference there. So what you have then is, imagine 5.6 million papers that have non-English speaking authors. Well, then you can say, okay, which countries are more likely to use, less likely to use GenAI based upon the author-institution affiliation. You can generate some pretty interesting descriptive statistics for country and scientific field AI uptake, and then what we eventually do in a difference-in-differences estimation is the similarity now of those papers that use GenAI to a corpus of English-speaking publications. So that’s how we did it. Think of it as a pre-ChatGPT between 2021 and 2022, and a post-ChatGPT between 2022 and 2024, looking at which countries were more likely to use GenAI using these lexical markers, and then how much has their similarity to a U.S. corpus changed? And that’s the main approach, and not surprisingly, well, maybe surprisingly, you find GenAI uptake is much larger in countries that are more distant in linguistics to English, and because of that use of GenAI, they’re converging much faster to the corpus of English scientific papers.
Scott Wallsten: Having this again, would you take that same keyword approach would you use some of these new tools?
Jeffrey Macher: Yeah, I would use the tools. I mean, if we were doing it again, probably tools. We are using some tools. We’re using the, I don’t know, you’re familiar with BERT, I’m sure, the Bidirectional Encoder Representations from Transformers approach, this idea that…
Scott Wallsten: Of course, who isn’t familiar with that?
Jeffrey Macher: This LLM is designed to understand the meaning of, the textual meaning of text. So it’s good for search, it’s good for sentiment analysis, it’s good for answering questions, it’s sort of an ideal model to compare and contrast between two separate things, these text embeddings. And SciBERTis a variant that we use, it’s trained on just scientific texts, and it has a very domain-specific vocabulary. So, in that respect, it’s good, but from your point, I use, instead of using these keywords, arguably a better approach would be to use the tools that are now a little bit more widely available. The other sort of knock on our paper, I know we’ll talk about limitations in a minute, but one limitation is we only look at titles and abstracts instead of the whole text. So you could argue that ours is a lower bound for whether or not something’s GenAI, and whether or not something might have greater linguistic similarity to it.
Scott Wallsten: Why did you make that decision? Just a resource constraint?
Jeffrey Macher: Yeah, I mean, it’s computationally intensive. With the 5.6 million articles and co-authors, you end up with about 11 million observations, and then you’re creating a 768-dimensional text vector with which to analyze this, so it requires some pretty strong computational power. But, ideally, if you had greater power, you could move to something that would include the main text.
Scott Wallsten: Solet’s step back and talk about innovation in general. Sort of the economics of innovation is your broad area, and you’ve written about lots of issues. And in 2008, you edited a book on innovation, and it’s almost 20 years later, which, on the one hand means it’s unfair to ask you to remember it, right? But what do you think, you know, what was the, what were sort of conclusions in that book then, and what do you think stayed relevant? What’s different now?
Jeffrey Macher: Yeah, that book was actually a sequel to another book that looked at U.S. competitiveness, and mainly in production.
Scott Wallsten: And actually, that first book was 1999, right? That was almost a 10-year gap.
Jeffrey Macher: Right, right. So that book was trying to make the argument that the U.S. remained competitive mainly in production in a variety of high-tech industries. And then the 2008 book with my advisor and co-author, David Mowery, tried to look at whether or not that remained true, and what we found was the value chain was beginning to disintegrate, between and among value change stages. Sorry, that’s kind of redundant, but the idea was that you saw this pull of design or production activities, either to be proximate to the customer or proximate to manufacturing. The U.S. still maintained a competitive position in a variety of industries, including biopharmaceuticals, semiconductors, software, financial services, but the overall outcome, where in many industries, we’re seeing global value chains, or global chains where design was occurring across the country, or the world, sorry, and potentially not in the U.S. anymore, that the U.S. competitiveness was eroding. So there was this question and concern around national innovation systems, this idea behind clusters of innovation, whether advantages in basic research related to university labs provided an advantage in innovation, invention, commercialization, things of that nature. And, almost 20 years later, that is further eroded. That there’s a greater amount of innovation that’s occurring outside the U.S. The U.S.’s stronghold in innovation is being threatened from natural forces that occurred over the past 17 years, but as we see right now with the Trump administration, dismantling our close relationships between government, industry, and universities, you know, via NIH funding and whatnot, that’s just going to accelerate more so. And then this idea of AI playing an important role. If AI truly democratizes the ability to do some things, some innovative things, it should further erode. There’s sort of concerns going, I think.
Scott Wallsten: Although, earlier you said that AI could counteract the slowdown in innovation, and so I guess for that, you mean globally, but at the expense of the U.S. has a relative advantage.
Jeffrey Macher: That’s correct. I think overall, the idea behind AI might be to accelerate, so there’s a, there’s a pretty famous paper now by Funk and Owen-Smith that looks at this. It’s called the Consolidation Disruption Index, and it’s in science, and what they argue is it’s essentially looking at both scientific papers and publications, the idea is, based upon the citation network, if a forward patent cites me and the patents that I cite, it’s consolidating. It’s not really disruptive innovation. But if it only cites my patent and not prior patents that I cite, it’s more disruptive. Now, their paper has some questions as to how they measure things, and there’s some bias in that that I’ve actually written on, but what they find is the overall disruptiveness of invention, measured through patents and measured through publications has slowed over time. Now, what AI might be able to do is reverse that, in general. Facilitate more disruptive innovation. Instead of building on the shoulders of giants, you’re standing on your own. And that’s a broad…
Scott Wallsten: But in this case, this is AI as a general-purpose technology, not AI itself, right?
Jeffrey Macher: That’s right. And if AI is a general purpose technology, I think that’s the key. If it levels the playing field, but then also forces individuals to be more innovative, maybe we move out of this supposed disruptive innovation glut that we’re in, and it brings in better science and more effective science. It’s a big question. I don’t know what the answer is, but certainly that’s the hope. Now, to your point that you just raised, does it affect U.S. competitiveness? And I think, yeah, it’s going to have an effect on, not marginalizing the U.S, but certainly not, it has the potential to further limit U.S. competitiveness as it relates to innovation, either disruptive or novel innovation, depending on how you want to measure it.
Scott Wallsten: Well, that, that poses some complicated questions for policymakers, right?
Jeffrey Macher: Yeah.
Scott Wallsten: I mean, we all, well, we all should want innovation globally to increase, because ultimately we’d all be better off, but we don’t want to be relatively worse off. So how would you think about that?
Jeffrey Macher: So there’s a lot of things that, to consider is, alright, what’s going to allow a country to create clusters of innovation that are long-lasting? In a related paper that I have, we, we show that one thing that seems to leave an effect, or have an effect, is having a good, basic science understanding of these, say, think of emerging technologies, automation, AI, GPS, things of that nature. We look in another paper at, say, 27 different types of disruptive innovations, and one of the things that stands out is a basic research understanding, being a pioneer region in basic research, like AI or automation, has a long-lasting effect on invention, your ability to patent. And then Bloom et al look at the ability of patents to generate commercialization in their QJE paper, and they find, actually, that there’s this diffusion from invention to commercialization relatively quickly. What we find, though, is there’s this pretty long-lasting effect of basic to applied research. So policymakers can foster investment in areas that connect universities to either entrepreneurial firms or industries that can use the research that’s generated in a way that creates invention. But that suggests that policymakers should be picking particular industries with which to fund, and I’m not sure that’s something we want to do. It’s certainly something that the U.S. hasn’t done, but others, like certainly Taiwan, has done it, to great acclaim with semiconductors, TSMC in particular. India, to a large extent, in software. So I don’t know what the right model is, but I know what we do find is there’s this great connection between basic research and applied research, from scientific papers produced to patents applied and granted. And from that, commercialization does occur. Policy makers need, and there’s other things that I could talk about here, but the idea of this is scary. To be picking winners and picking losers, or industries with which, National Innovation System’s going to emphasize is…
Scott Wallsten: I mean, the other scary part is that we seem to be defunding basic science.
Jeffrey Macher: Yeah, even worse, right? We’re, the one thing that we were always good at, the one thing that we always had an advantage in, if you look at most measures of country-level innovation, be it forward citations, number of patents produced, number of high-impact patents and this CD index, you find the U.S. is in a relatively good position. How that affects, well, let’s just say this, Scott, by the time you and I retire, people will be looking at the Trump administration’s effect on U.S. innovation and performance and their position, in a pretty cool, quasi, natural experiment, using…
Scott Wallsten: They might not be looking at it here.
Jeffrey Macher: You’re right. Yeah, because everyone’s gone. They’re all gone to Canada, to other, you know, other countries that are going to continue to fund it. Yeah, you’re right.
Scott Wallsten: So we’ve covered a lot of ground, and so I think we should wrap it up. That’s a good place to stop. Jeff, thanks so much. It’s always fun talking to you.
Jeffrey Macher: Scott, it was great. Thank you, this was really fun. I had a good time.
Scott Wallsten is President and Senior Fellow at the Technology Policy Institute and also a senior fellow at the Georgetown Center for Business and Public Policy. He is an economist with expertise in industrial organization and public policy, and his research focuses on competition, regulation, telecommunications, the economics of digitization, and technology policy. He was the economics director for the FCC's National Broadband Plan and has been a lecturer in Stanford University’s public policy program, director of communications policy studies and senior fellow at the Progress & Freedom Foundation, a senior fellow at the AEI – Brookings Joint Center for Regulatory Studies and a resident scholar at the American Enterprise Institute, an economist at The World Bank, a scholar at the Stanford Institute for Economic Policy Research, and a staff economist at the U.S. President’s Council of Economic Advisers. He holds a PhD in economics from Stanford University.



