#DataSharing in #ecology – risks, rewards and expectations?

Remember taking a math test in, say sixth grade? There was that painful requirement that you show your work. If 2 + b = 7 and a – 5 = 10, what does a + b equal? Line up those little equations and hammer them out for the teacher, because she doesn’t care if you get the final answer, she wants to see how you got the answer.

Just like sixth grade all over again, the current generation of young ecologists* will have to deal with showing their work. Specifically, there is an abundance of data we are collecting and working with, much of which can be used for multiple purposes, from meta-analyses (See Christopher Lortie’s recent PeerJ pre-prints) to systematic reviews to reanalysis. Based on the era of big data, there is a considerable and ever-evolving discussion on how data should be shared, used and published within the ecological and larger research communities. . Some people find it to be an ethical issue, whether data is made publicly available. In the vein of elementary school math exams showing one’s work and data, various discussions have come up in the social media world lately:

A couple days ago, Brett Favaro documented a nice Twitter discussion on several individuals’ thoughts on whether data-sharing entitles a data collector, steward, or supplier to authorship:

Screen Shot 2013-07-17 at 11.17.46 AM

The International Committee of Medical Journal Editors, among others, has been clearly spelling out authorship requirements for years. How these ideas will change with years and sometimes entire careers’ of data becoming available, remains to be seen.

Gail Steinhart showed us how not archiving data can lead to, well, no data.

Today Dynamic Ecology ran Jeremy Fox’s cynical views on how data sharing will somehow make statistical analyses worse. Like many DE posts, it received a bit of a response, my own opinions included.

Screen Shot 2013-07-17 at 11.03.03 AM

Ideas in Ecology and Evolution will be running a timely special issue on data sharing and open access. Many of the preprints are, not surprisingly, posted to Peerj. For example, if you have some data to share and were wondering where to get started, Ethan White et al. have a preprint on how to ease the pain.
Screen Shot 2013-07-17 at 11.49.41 AM
So, why should we share data, how should it be used, and and how shouldn’t it be used? If I publicly archive my data, am I entitled to authorship on any project that uses it? Am I setting up people for researcher degrees of freedom where they might misuse the data? Should we be obligated to archive our data, as is done in Ecological Archives or on NIH grants? How do early career researchers overcome the generational inertia of older researchers who may think all day is proprietary? How do non-academic scientists develop effective data-sharing frameworks for their organizations?

Screen Shot 2013-07-17 at 11.52.50 AM

I know that I plan to make available all of the data from my projects, if only I could find the time to provide sufficient metadata and post it to figshare. Data sharing requires the consideration of real values that merit consideration as scientists continue to write grants, collect big data and publish science. I don’t purport to have all of the answers, but if the trend in geospatial data being commonly shared and widely-distributed is any indicator, we should all create some good metadata, because people want to see the data and this is a test.

*After a long month off from new posts, the ECE team has some big news. First off, Sarah has a PhD…and a job. Big ups to one of ECE’s founding scientists and good luck as she pursues her dreams! Much of the rest of the team has been busy in the field or working up deliverables for their respective projects and degrees. We aren’t going out with a whimper, we’re just hustling…and plan to find the time to hustle up some more ECE discussions in the future. Until then, get ready for #ESA2013 and work on that metadata.


4 thoughts on “#DataSharing in #ecology – risks, rewards and expectations?

  1. Just to briefly elaborate on the content of my post which Nate kindly linked to and commented on: I think data sharing is a good thing on balance. I’m for it, not against it. My post merely suggests that increased data sharing increases the opportunity for people to make certain kinds of statistical mistakes, mistakes that also get made when people analyze unshared data. Put another way, I think increased data sharing makes it even more important than it was before that we find ways to address what others have called “researcher degrees of freedom”.

    I didn’t mean to sound cynical in taking this view, though I can understand why Nate characterizes the post that way. My hope is that the post, like many of the posts I do, is “productively contrarian”. 🙂 But of course, only readers can judge whether I achieved that.

  2. Jeremy – your post speaks for itself: there are lots of things to be concerned about with data dredging in data of all stripes. Your contrarian view in this case encourages discussion, and that’s what it’s all about. In today’s case, I think it was perhaps a misnomer of a title and that one liner that sparked my interest. Kudos!

    • As you’ll have seen from the comments, you’re not the only person who didn’t like my post title. All I can say I thought it was fine, but post titles are obviously a matter of taste. If you try to give your posts pithy titles, as I do, eventually you’re going to come up with one that some readers don’t like. But as long as people still read and understand the post (and are glad in retrospect that they did so, as opposed to feeling like the title sucked them in to reading something they now regret having read), that’s the most important thing.

  3. I think, and many of the comments over at DE (including Jeremy’s) reflect this view as well, that the rewards of data sharing far outweigh the risks. While the problem of ‘researcher degrees of freedom’ is something to bear in mind, one can no more stop poor statistical practices from being used on “new” data than on “shared” or “re-purposed” data. And, if the data is being used outside of a traditional hypothesis testing scenario (e.g., for parameterizing process models, etc.) the problem of researcher degrees of freedom is probably lessened to some degree.

    I’ll also mention that Dryad (http://datadryad.org/) is another place for data and in my experience was pretty easy to use.

Comments are closed.