Original Source Link
I often edit a manuscript draft with co-authors by sending drafts of a document back and forth via email. Depending on who I am working with, cloud-based solutions to working collaboratively are not always an option.
After some back and forth a lot of interim drafts begin floating around, so a lot of my co-authors either initial, date, or number a working document when we send it back and forth for edits. Some do none of this at all. I am wondering if there is a correct way to “name” a document when collaborating via email. Is there a consensus on best practices for file-naming conventions when collaboratively editing via email?
I come from a field where none of these answers are going to work. Remember, most scientists aren’t computer scientists. As a grad student, sending professors arcane rules for naming conventions would probably just be ignored. Everyone has their own usual pattern, and even if they wanted to be helpful, might just forget when it comes time to save. What irks me is people that put spaces in filenames.
As much as I plan to improve this process if I ever run my own lab, here’s what you have to do:
The first author (or author leading the publication, or the corresponding author, or someone picked to facilitate) runs the show.
- When you send out a draft, state a date for when you’d like to receive comments by (two weeks is a good rule-of-thumb). Don’t make your own edits in the meantime if you can help it.
- When you send out a draft, put a date on it. When people start sending you comments, sometimes people will be kind and edit one that someone has already edited. IME, that doesn’t always happen.
- At the end of the time period, or once you’ve received everyone’s comments, use Word’s document merge tool.
- Save with the new date, and start incorporating edits and responding to comments.
- Rinse and repeat.
You will end up with a lot of files, with different dates. I keep the files from step #4 only, once you are confident in the merge. Frankly, space is cheap, and personally I find it easier to open
paper-200303.docx to find an old comment than revision tools (for Word). When the paper is accepted, you can delete the old versions.
Not using version control is bad.
Depending on who I am working with, cloud-based[, version controlled] solutions to working collaboratively are not always an option.
You can use cloud-based solutions even when some collaborators are against them. All you need to do is: Download and email the cloud version to collaborators that refuse to use the cloud, and upload whatever they send back.
The key to modern version control such as git is knowing the parent documents of a document. You thus need to be able to reconstruct which the immediately previous version is.
So, ask them to mark their version with their new version number and name, at the least, but, ideally, add from which version they have constructed it (or multiple of these if this was a merge)
Thus, at the very least, OP could use -.-.txt.
So you could deduce that that rollingstones-4.2-PK.txt has been derived by PK from (probably) 4.1. As well as rollingstones-4.2-IR.txt has also probably 4.1 as parent, but modified independently by somebody else. When you merge versions with the same number, you can omit the author and just give it the following number, e.g. if rolling stones-4.3.txt is a merge of the previous ones.
If you can afford to and people are disciplined, it would help to mark the immediate predecessor, though: rollingstones-4.4-UM-from-4.3-PK.txt. This is a bit clunky and a poor imitation of modern VCS such as git, but it allows you to deduce the parent(s) of the present version which is all you ultimately need.
To facilitate that, ask people, directly on downloading the latest version, to duplicate it and modify its name immediately to reflect the parenthood of the downloaded version.
I’m surprised nobody has mentioned the classical “token”/”cookie” system.
The way I used to write papers with coauthors 20 years ago was using an informal token system. If I wanted to edit Section 1 of the paper, even just to fix a single typo, I had to follow these steps:
- Email all coauthors with the text “I am claiming the token for section 1.”
- Edit section 1
- Email all coauthors with their edited version of Section 1 and the text “I am releasing the token for Section 1.”
Nobody was allowed to keep the token for any section more than some agreed limit. typically 24 hours, but that often shrank to 2 hours or even 15 minutes as deadlines got closer. In principle, everyone could keep their own local copy of the paper up to date, but in practice, it was helpful for one co-author to periodically recalibrate by claiming the token for the entire paper.
As long as everyone followed token discipline, there was no need to worry about file names. There were no version disputes, because the most recent version of Section 5.4 was always by definition in the most recent email releasing the token for Section 5.4. In particular, if you branched, it was your responsibility to merge correctly, not your coauthors’.
On the other hand, co-authors (including both PhD students and tenured Luddites) who didn’t follow token discipline found themselves involved in fewer papers afterward.
While my paper collaboration has mostly moved to Overleaf+git, I do actually still use this system on the unavoidable but thankfully increasingly rare occasions that I need to collaborate on a Word document with someone who doesn’t have access to Word Online or Google Docs.
tl;dr: Don’t do this unless you have to.
If you want to avoid cloud solutions and use email, maybe choose a VCS which works offline (i.e. is distributed, like git) and has support for email (preferably built-in, like git).
There are many ways to set up such a workflow, here’s an example for git. Essentially: you work in git like normal, and when you want to send someone your changes you can use
git send-email; after receiving an email containing changes you would like to apply (e.g. maybe after some back-and-forth discussion in reply to a
git send-email message) you can pipe that email into a command like
git am to incorporate the changes.
git is well-suited to use over email, since this was its original use-case and is hence the preferred and best-supported way to use it.
This doesn’t strictly answer the question, because it is about adopting no convention at all. As others have said, it is usually difficult to get authors to stick to the same system.
Assuming you use a format (e.g. MS Word) which has some sort of “track changes” feature, or a text format (e.g. LaTeX) which you can
- Let authors rename files in any way they please
- One person (perhaps unofficially) takes responsibility for maintaining some sort of continuity of the document (i.e. keeping the structure and flow OK)
- If the document versions diverge, this person uses the “track changes” feature to pull them back together
- And then emails the result to everyone saying “I’ve incorporated everyone’s changes”
- Some authors won’t work off this version straight away, especially if they were in the middle of writing something or working closely with someone else
- But eventually they will because they don’t want to be left out of the loop
- In the meantime the “maintainer” just keeps adding their new changes to their “master” document
- The key is that they don’t need to do any work to switch to the master version.
Authors that aren’t off doing their own thing will know which version to choose (the one that says “everyone’s changes are in here!”)
This has many other benefits like saving most authors time and hunting through emails, removing the danger of changes being lost, stressing authors about continuity problems, and having someone who is looking at the big picture of the document and can discuss that with other authors.
A “modified date plus initials” combo might help, perhaps combined by a journal abbreviation if a template is followed, e.g. “Nature 12-12 BH”. Personally, I find dates easier to track than version numbers. In any case, forking the versions must be avoided at all costs.
If Github/ Dropbox/ OneDrive etc are not an option, an online LaTex editor such as Overleaf might be, where each collaborator can work on one single version of the paper. Another solution is e-mailed download links, since one does not need to have an account to access a file (this has the advantage of process ownership and monitoring but includes more hassle). If lack of internet is a problem, I cannot think of something.
tl;dr: In my personal opinion, the best option is to keep the filename at all times.
Here’s my reasoning. The filename tells us what purpose the document serves, which information it contains. The meta information (who is the author, when the last edit was made) is saved in file properties or in special fields within the file itself. The history of edits is traditionally maintained using some version control system (VCS).
In your case, you use email as your VCS. The emails are timestamped and your email client allows you to sort emails according to this date. Emails also give you information about the last author. If people send their edits by responding to the email with the version from which the edit was made, the email client keeps the whole tree of edits, just like git, allowing you to find a parent for every version. A tree of emails is a direct equivalent of a git tree. You may want to create a filter to put all emails with this file attached in a special folder to separate them from the rest of your communications. Other than that, email is already a minimalist and incomplete (no automatic merge for example), but working VCS.
Since you already have a VCS, modifying filenames to code the same information is unnecessary and inconvenient, and should be avoided.
PS: And it goes without saying, email is a much poorer VCS compared to e.g. git, so you should at least offer your collaborators to try using a better system for collaboration.
If you must share files in such a way (and often you just have to, despite the many wonderful version control, cloud file-sharing and collaborative editing tools out there), I suggest a format such as
cure_for_cancer-202005011030-jb.tex, where the timestamp is for 10.30am on May 1, 2020.
That makes it easy to sort multiple versions of the same file lexicographically (i.e., by file name). But of course, you’ve then got the challenge of getting collaborators to follow the same convention.
In my experience, the single biggest problem is that the file’s “last modified” metadatum often ends up reflecting the time it was “last saved/downloaded”. That means that establishing where it fits into a workflow can be a nightmare.
My solutions, which do not require computing expertise (I work in a humanities discipline, so one cannot assume everyone is comfortable using the suggestions in other answers), are as follows:
enter dates of recent revisions and associated author monograms in the page header manually (e.g.: “JB 31/04/2019; revised JRW 02/05/2019; JB comments 03/05/2019”) — this makes the information easy to find and ensures it is included on every page of a printout (yes, I like to comment on versions by annotating a printout by hand!); and
all file names commence with the date of the version in yyyymmdd format (e.g.: “20190503_JB_comments_re_20190502_JRW_Methodology”), in order to facilitate quick sorting and unambiguous identification of recent versions on computer filesystems.
I would take a stab at semantic versioning, as suggested by the people who run Github.
In short: append a series of three digits to the end of the filename. For example: myfile-1.0.0.txt. When someone sends it back to you, whatever they respond with, you can tick up your next version to some identifier in that format which clearly counts as numerically greater.