Coding philosophies

These are my code development philosophies. I think they're important or even vital, and if all scientific coders adopted them, it would be to everybody's benefit. As my act of advocacy for them, I'm listing them here.

Release early, release often

I read somewhere that if you're not deeply ashamed of the first release of your code when you look back on it later, then you released it too late. I've adopted that wholeheartedly. I think it's vital to release codes early. If you can get it to do what you want it to do, then release it. Of course you plan to add more functionality, make it easier to use, make sure there are no bugs in it. Releasing it only helps with those goals. Any users who are not you will test your code in ways you didn't anticipate, and they will break it, and if they like the functionality it provides then they'll tell you it broke and ask you to fix it. The earlier you release it, the sooner you get users, who drive further development.

Often, codes described in journal papers are mentioned as being available "by request" from the author. This is a small barrier to getting users but nevertheless one that many people won't bother getting over. Suppose your code does something similar to another code. People who use that code would probably like to try yours out to see how it compares. Put a barrier in the way of that and you'll inevitably lose some potential users.

So, if you plan for anyone else besides yourself to use your code, make its source code available for download as soon as it even vaguely does what it should. List it on so that people can find it.

License your code

A lot of codes don't have a clear license statement. If there isn't one at all, then even if the code is available, users can't assume that they have permission even to run it. Assuming you've made the source code available, then it's likely that you at least want people to run the code themselves. Any license statement at all, even if it explicitly says "you don't have the right to do anything except look at this code", is better than none.

The website is very useful with good information about the various options.

License your code freely

Any license is better than none. But the best license is one which gives your users permission to adapt your code, distribute it, and distribute their modified versions as well if they wish. If other people extend the capabilities of your code or improve how it runs, then that is to everyone's benefit. The easier your code is to distribute, the more people will use it. Two groups of licenses that give users these freedoms are the "copyleft" licenses, and the "permissive" licenses. Copyleft licenses require that any code which incorporates yours is released under the same terms. Permissive licenses don't place any restrictions on the reuse of the code, only requiring that your attribution be maintained. Personally I prefer copyleft licenses for codes, and I release mine under the GNU General Public License.

The GNU philosophy outlines some reasons why freedom of use is so important.

Standardise your build

The easier it is for someone to install your code, the more likely they are to use it and contribute to its development. I've seen quite a few codes which are not written to be installed when compiled but will only run in the directory that the executable is in. This can be very inconvenient. Under UNIX-like systems, there's a "filesystem heirarchy standard", which says, basically, that your executable should go in /usr/bin, any static data files that it relies on should go in /usr/share/codename, and documentation should go into a directory under /usr/share/man. The standard is described here.

Document your code

Documentation is crucial but it's a tiresome and often neglected task. I've found that writing man pages is a convenient way. The GROFF formatting used is very basic and encourages simplicity and clarity. I then use groff itself to convert the man pages to html, for inclusion on my web pages. Some good information about writing man pages is here.

Package your software

Distribution of pre-compiled binaries that users can install through package management systems is an extremely convenient way to distribute software. It has a couple of huge advantages. Firstly, if your code depends on other software, you can simply define the dependencies, and users who install your code will automatically get the dependencies installed if they are not already. Secondly, your code will get built independently, often on a number of different architectures. This ensures that your code is portable and standardised.

I started packaging my codes in a Personal Package Archive, through which users of Ubuntu and derived systems can install my codes. I then moved upstream to Debian, and worked on getting my codes into their repositories, from which Ubuntu and its derivatives are ultimately cloned. I highly recommend getting codes into Debian - the process further encourages maintenance of standards, and then anyone who uses Debian or its derivatives has instant access to your code. According to an informal survey I carried out, that's about a third of astronomers. Debian's astronomy developers page is here.

Ideally I would like my codes to be in repositories available to Red Hat-based linux distributions and Mac OS/X as well as Debian users. I'm still working on understanding the relevant processes for that.