I believe the answer is clearly “no”, but judge for yourself. The emails make interesting reading.
This began when an acquaintance of mine, who does not have tenure and shall remain nameless, asked David Monaghan and Paul Attewell for the statistical code for their paper, “The Community College Route to the Bachelor’s Degree.” The paper was published in a top journal, Educational Evaluation and Policy Analysis, and received quite a bit of publicity (e.g., Washington Monthly, Inside Higher Ed, Science Daily). He was having trouble replicating some of their descriptives, and mentioned they seemed reluctant to provide their code.
I thought that was odd; Attewell is very well known in our field, and I couldn’t for the life of me understand why someone of his stature would be unwilling to share code. Because I believe we should be engaging in more replication efforts in postsecondary research, I made a similar request to see what would happen. Here are the emails:
I read your paper “The Community College Route to the Bachelor’s Degree” with great interest. I’d like to replicate your results, could you send me the code?
Great to hear of your interest in the paper. I would love to be if help. The difficulty is that the code (especially for variable creation, dataset merging, etc.) is scattered across a number of do files. Perhaps if you could be specific about a particular analysis or set of variables I could answer your questions more readily.
Thanks for the quick response. As I said, I wanted to replicate the entire paper.
I don’t mind sorting through the do files, you can send them on and I will figure them out.
Just following up on my request for the do files for your paper.
I apologize for the delay. Honestly, I am a little pressed for time right now, and going back and hunting down all of the relevant do files is not something likely to happen any time soon. This paper came out of a larger set of analyses carried out by Dr. Attewell, myself, and others, and a lot of the more complex variable creation was carried out for these more general purposes. What I can and will do for you is send you the last do file – the one which produced the final analyses, and then try to answer any questions you have about what we did. I hope you feel that that is sufficient.
I am in the process of moving and will be occupied for the next few days. I can send you that do file next week if you like.
A single syntax file using previously created variables and datasets from other, unknown syntax files is useless for replication purposes, as I’m sure you’re aware. So let me be blunt, as is my wont.
You seem very unwilling to share the syntax files used to create your paper. Your excuse that you just can’t be bothered to assemble the syntax files for a very recent paper is simply not credible.
At this point, I can only conclude that 1) your syntax files are such a mess that you can’t produce the code you used; in other words, you find it impossible to reproduce your own research; or, 2) you have the code, but made some questionable choices in the course of your research and are now worried about being found out.
As the research scandals in political science (Michael LaCour) and social psychology (Diederik Stapel) demonstrate, replication is the key to advancing knowledge in a discipline. The fact that you and Paul are unwilling to share your syntax files, and allow others to replicate your well-publicized research findings, speaks volumes about your dedication to science and your ethics as a researcher.
I am cc’ing Paul on this, in case he does not know what is going on.
I am sorry that you feel this way. I am also slightly confused as to why you have escalated our dialogue rapidly to insinuations of fraud when I simply offered to give you something less than precisely what you requested. That you don’t find my explanations believable is unfortunate, because I believe that a moment’s thought will reveal them to be reasonable.
First, let me suggest to you that your request is a bit excessive. You are not simply asking for the code to part of our analysis, or for our way of constructing one or a set of quantities. That could be readily accomplished. And I would expect such a request if, for instance, you tried to replicate our results based on the descriptions we gave in the paper. Have you done this? If so, is there a discrepancy between the results you are obtaining and the results we reported? If so, you and I could compare how it is that we constructed certain variables in an effort to figure out why the differences are arising. This would be a potentially interesting and productive exchange. But you are not requesting this. You are asking for the full road map to go from a number of separate data sets (the BPS, the PETS, and the NPSAS) to the results in the paper. That, in all honesty, is the result of quite a bit of hard work carried out by Dr. Attewell, myself, and a number of other researchers. Moreover, as I mentioned, we used the BPS and the PETS data to carry out a number of analyses (beyond simply the analyses in this paper), and the baseline code (e.g., for making the “terms” data in the PETS direct file usable to isolate student progress through semesters) is in a number of separate do files. I can certainly find them and send them all to you, but doing so would require me to take a bit of time to figure out exactly what you would need. I simply do not expect to have this time in the very near future.
Second, I think we are being quite transparent here, though clearly not in the manner you would prefer. We provided, in the paper itself, clear descriptions of what we did. Next, I offered to explain in detail anything which remained confusing. Finally, I offered to send you the do file which produces that actual tables in the paper. That you can get next week (the do files are on a restricted access hard drive and I won’t be in the same city as that hard drive until Monday at the earliest). Once again, I would explain anything that you found confusing at that point.
The tone of your reply was somewhat curious, though, and I think you will understand that it has made me in turn more curious about your goals. So I must ask you for a bit of transparency as well. What are you investigating, and what role does replicating our paper play in your research plan? Are there particular aspects of our argument or findings that you find particularly troubling, unbelievable, or objectionable? If so, what are these? I understand that you do not have to answer these questions if you are not so inclined, but you portray yourself as someone who both appreciates and regularly indulges in full disclosure. And understand that knowing a bit more about your project may lead me to be more invested in helping you out.
Hi Dave and Paul,
As for my transparency, read my very first email to you: I said I wanted to replicate your paper. This means I want to see if I can come up with the same set of numbers you reported in your paper, or whether those simply can’t be replicated. Replication is a vital part of the scientific endeavor. The issue is not whether your findings are objectionable or troubling; the issue is simply whether your findings can be replicated by someone else.
Your argument that my request is excessive is just plain silly. Most quant papers involve quite a bit of work, and yet scholars share their code all the time. If the “But I’ve worked SO hard on this!” excuse were valid, no one would ever share their code, and the major journals in economics and political science would not require authors to make their code available (as well as their datasets!).
Your idea that we could somehow compare discrepancies is also silly. So many decisions come into play with sample definition, variable creation, and missing data decisions that we could spend hours on the phone or via email trying to figure out what was going on, when letting me see your code would easily solve the problem. Same goes for me trying to replicate your numbers on my own – your paper does not contain all of the information contained in your do files, and to argue otherwise is disingenuous.
If you are offering to send me the do file that replicates the tables in your paper, and all I have to do is run the do files against the datasets from DOE, then you would be fulfilling my original request. But your previous emails (April 30 and May 29) made it sound as if this do file was pulling in datasets created by other do files. So the do file you are offering to send me is useless for replication purposes, as you are fully aware.
You keep insisting that you are too busy right now to comply with my request. Fine, I am a very patient man. Please provide me with the date by which you and/or Paul will send me all of the code used to generate the analyses in your paper. Or, quit putting me off with excuses and just state that you and Paul refuse to make the code available. And be sure to safeguard those computer files when you move, it would be a shame if they were suddenly lost.
Nothing in your last email has convinced me that my previous statement was incorrect. Frankly, if you and Paul don’t enjoy being publicly called out, don’t publish in top journals and then refuse to share your code with other scholars. We are all very busy, but that does not absolve any of us of our responsibility to be open and honest about our methods to the scientific community.
I am increasingly annoyed about the tone as well as the content of your email communications with Dave Monaghan and myself. Threatening to “Call us out” and references to academic dishonesty without a shred of evidence that we have done anything wrong are insulting. Our results in the EEPA article are in no way controversial. NCES and others have published their own analyses of transfer and the loss of credits, and in general their findings are consistent with ours. Our paper provides plenty of details needed for someone to replicate our results, and we have previously offered to help answer any of your questions or address any issues.
I have been discussing the issue of sharing code with colleagues and I intend to communicate with the editors of EEPA to seek their advice on this matter. But so far what I have heard suggests that your request is what is aberrant. Of course replication is important. You are quite welcome to replicate our research. You can accomplish that by obtaining this restricted government data and programming the many analyses that constituted our paper.
From the beginning, Dave has offered to consult if you have any questions about how we specified variables beyond what we wrote in the published paper. So please go ahead and replicate our analyses if you wish. We believe that you will find that our EEPA article is sound, in its findings and in its methods.
No one I have spoken to so far has suggested any requirement or custom that the authors of a paper should provide all their statistical and database management programming to others. The programming represents many months of Dave and others’ work including various analyses beyond the EEPA paper. It involves several initial datasets, multiple do files, building several intermediary datasets, merges and so forth. It was not one simple program, or single statistical model. The paper involved dozens of models, propensity matching runs, and so on.
Dave was not exaggerating when he told you it would be very time consuming to provide you with the entire programming in the form you have requested.
You seem to believe that “replication” means providing you with everything you need to simply push the run button on a do file and get the output from which we published, and that you are entitled to demand the considerable effort required for us to provide you with that kind of comprehensive program.
Unless and until we are convinced that this is required of scholars in our field, we do not intend to put our other work on hold so that you can perform an effortless replication.
You say that sharing data and programs are a requirement of major journals. This is not the case in our discipline (sociology) and we will find out whether it is EEPA’s written policy.
As you know, the data in our case is restricted by the federal government to licensed users, so that could not be shared under any circumstances.
We will seek others’ advice on this matter.
In any event, I want to hear no more threats from you or references to fraud or malpractice.
Ah, the accusation of “tone” finally appears. I’m surprised you didn’t also accuse me of “bullying”, as that seems to be the tactic used by academics who don’t have a leg to stand on and don’t want to admit it. Better to complain about something nebulous than deal with the facts at hand, right? Let’s stick with the facts instead.
I don’t know why your “in no way controversial” findings matter for this discussion. Whether your findings are pedestrian or outrageous has nothing to do with whether they should be replicated: all scientific findings should be replicated.
You know as well as I do that EEPA has no policy on this, and that the editors will tell you there are no sharing requirements. The only reason you are “consulting” with the editors is to provide a veneer of cover for your refusal to make your code public.
Once again, you bring out the tired defense of all the hard work you’re put into the code, which is why you can’t share it. You make it sound as if you and Dave are the only academics who spend time working on their statistical code. Please get over yourselves.
I’m not asking you to go through the code and insert comments explaining what you did; I’m not even asking for you to explain what each do file does. As I explained previously, just copy all the do files you used and send them to me; I will figure out the rest. If you’re uncertain how this works in practice, try right-clicking on the files and choosing “Copy”.
You mention that sharing code is not the case in sociology. My experience with sociologists, other than yourself, is twofold. I requested files from Jennifer Lee at Indiana for her 2007 paper in Sociology of Education; so much time had passed that she could not find her files due to switching computers, but she attempted to comply with my request by searching her computer files for the code. I also asked Laura Hamilton at UC Merced for the code she used in her recent American Sociological Review paper on college finances, and she happily complied. Neither one thought my request was “aberrant”, nor did they think it was outrageous that I expected them to give me their code, so that I could replicate their results at the push of a button.
At least we have closure, in that you are now explicitly refusing to share your code. I find it revealing that Dave agreed in an earlier email that at some point he could send me all the code, but when pressed for a date by when this would happen, providing the code suddenly became impossible.
By the way, I’m not sure what you’re talking about in terms of my threats. If you go through my emails, there are no threats, simply a statement that if you don’t like being publicly called out for hiding your code from others, then don’t publish your research.
So here is where we are:
- Given your refusal to make your code public after multiple requests from two different researchers, I believe a reasonable person would assume that either a) you are unable to produce the code, or b) you’ve made some questionable choices in your analyses and don’t want others seeing what you’ve done in your code.
- In order for the field to advance, it is essential that scholars cooperate in replication efforts. At a bare minimum, this means cooperating with simple requests, e.g., sharing electronic files such as statistical code.
- Because you and Dave refuse to make your code public, I think you’re both sorry excuses for scholars, and a shining example of why social science research is in such a sad state. You want to claim the glory of publishing in a top journal, but refuse to be open with other scholars as to how exactly you got your results. Truly pathetic.
- You can spare me the pompous pronouncements that you want to hear no more: I’ll say whatever I please to whomever I please. Feel free to sue me for my statements. My wife is a former attorney, and she has explained the joys of the discovery process to me.
You mentioned that you were annoyed with my emails. This will likely annoy you as well:
I am more than happy to post any responses by you and/or Dave on my blog.