lundi 2 mars 2015

How to integrate partial version control, data exchange and research assistants?


Currently, my coauthors and I use GitHub to collaborate in coding and writing but also in data exchange. We have a lot of data, often not in binary format (e.g. pdf). Most of this is collected by research assistants, which which we don't share our Git repo.


More specifically, we use python, shell, R, Stata and Latex and most of it is fully integrated. That is, python and shell scripts generate the data that is used by R and Stata whose output is directly compiled in Latex.


We don't want to deviate from this high level of automatization, but our approach has to main shortcomings:



  1. Research assistants can and may not upload their data directly into our repo. Instead, they send us their data and files via another git repo. This puts additional workload on me but we want to make them use git for it's fascinating issue tracker. However, there is too much additional work for us and git is often to complicated for young research assitants (even the GUIs).

  2. Because of the data, which eventually changes, our repo is very large and internet is sometimes restricted. Getting the repo on a new computer is very time-consuming and often not working. Although the raw data changes from time to time, git, which was not made for the data exchange, keeps track of these changes. But that's useless for our purpose.


Can you suggest to me other software(s) or approaches, which combine the integration we have achieved so far where we can easily exchange data?





Aucun commentaire:

Enregistrer un commentaire