[e2e] was double blind, now reproduceable results
Jon Crowcroft
Jon.Crowcroft at cl.cam.ac.uk
Thu May 27 23:44:50 PDT 2004
thanks for this - very useful reference, and of course as well as
privacy concerns, as david reed points out, there are patent warfare
concerns too - we just had a nice talk by hal abelson here about the
MIT ideas in open courseware and in Dspace, that led to the creative commons
ideas for how to have a more flexible aprpoach to things, and I think
that while that might address IPR worries about background nicely and
attribution and exploitation, it didnt really cover privacy that
well...a hard problem, (and as i said one which at least the eu
medical world has coped with remarkably well) without losing compelte
siight of the value of having all the data available to liegitimate
other researchers to further all our knowledge...
but i would like to ask: how relevant is this to 99% of computign and
communications research? frankly, I have only twice ever had to be
careful about data in papers, and eventually (one case in 3 weeks once
patent was filed, the case in about 3 years when the currency of the
network monitoring data was no longer a threat to the operator if it
was in competitors hands) we could release all the information and
code associated with the paper...
personally, i think that in our business (whatever it is)
legititmate reasons to withold data and methods that would allow
someone to reproduce our work, will be the exception rtaher than the
rule, and that we should make use of something like Dspace or other
Archiv repositories to maximise knowledge rather than arguing about
the occasional reasons (at least here) not to...
i mean lets take a sample of the lsat 20 year sigcomm papers - how
many of them could we confirm, and how many if we cannot confirm them,
could we come up with a decent excuse for why not?
In missive <p06100503bcda9a14f368@[18.26.0.27]>, "Karen R. Sollins" typed:
>>--============_-1126523080==_ma============
>>Content-Type: text/plain; charset="us-ascii" ; format="flowed"
>>
>>With respect to anonymizing and releasing data, let me recommend to
>>you a few pages of a CSTB report, "Information Technology Research
>>for Federal Statistics", National Academy Press, 2000. See
>>http://www7.nationalacademies.org/cstb/pub_federalstatistics.html.
>>The focus in that report was to a large extent on data about people,
>>so privacy was often considered paramount, but I believe the
>>techniques and state of technology with respect to statistics are
>>applicable here as well. I recommend to you pp. 34 - 40, the section
>>titled "Limiting Disclosure" in Chapter 2, "Research Opportunities".
>>The conclusion to draw from their examples is that at least in the
>>domains over which they were looking, it is not well understood how
>>to truly hide the information one wants to hide. Rather than
>>possibly misrepresenting the story to you, I recommend you read it
>>for yourselves.
>>
>> Karen
>>
>>At 10:21 AM -0700 5/26/04, Joe Touch wrote:
>>>RJ Atkinson wrote:
>>>
>>>>
>>>>On May 26, 2004, at 11:52, Joe Touch wrote:
>>>>
>>>>>Impossible is the case I was referring to. Certainly IF transforms are
>>>>>possible then they should be used and the data made available. However,
>>>>>some data sources aren't comfortable with these transforms, since there
>>>>>may be data correlation that ends up compromising the transform.
>>>>>
>>>>>Notably those that correlate data to existing Internet routing tables -
>>>>>if you found something that preserved not only prefixes but also the
>>>>>aggregation, and published it, you'd have to publish the routing tables
>>>>>similarly transformed. However, since the untransformed routing tables
>>>>>are available publicly anyway, you've compromised your transform.
>>>>
>>>> Whether the transform is compromised would depend greatly on which
>>>>particular routing tables one was working with.
>>>
>>>Yes. Bob was saying that there exists. I'm claiming there are cases
>>>where there does not exist - i.e., not for all.
>>>
>>>The issue is what to do with a paper published in the case I'm
>>>considering; we all know what to do when you CAN safely anonymize.
>>>
>>>However, note that you don't always know when it's safe - just
>>>because _you_ can't correlate the info doesn't mean someone else
>>>can't.
>>>
>>>Joe
>>>
>>>
>>>
>>>Content-Type: application/pgp-signature; name="signature.asc"
>>>Content-Description: OpenPGP digital signature
>>>Content-Disposition: attachment; filename="signature.asc"
>>>
>>>Attachment converted: Macintosh HD:signature.asc ( / ) (0011E338)
>>
>>
>>--
>>Karen R. Sollins, Ph.D.
>>Principal Research Scientist
>>MIT CSAIL, The Stata Center
>>32 Vassar St., 32-G818
>>Cambridge, MA 02139, USA
>>V: +1 617 253 6006
>>F: +1 617 253 2673
>>E: sollins at csail.mit.edu
>>--============_-1126523080==_ma============
>>Content-Type: text/html; charset="us-ascii"
>>
>><!doctype html public "-//W3C//DTD W3 HTML//EN">
>><html><head><style type="text/css"><!--
>>blockquote, dl, ul, ol, li { padding-top: 0 ; padding-bottom: 0 }
>> --></style><title>Re: [e2e] was double blind, now reproduceable
>>results</title></head><body>
>><div>With respect to anonymizing and releasing data, let me recommend
>>to you a few pages of a CSTB report, "Information Technology
>>Research for Federal Statistics", National Academy Press, 2000.
>>See<font face="Lucida Grande" size="-3" color="#000000">
>>http://www7.nationalacademies.org/cstb/pub_federalstatistics.html</font
>>>. The focus in that report was to a large extent on data about
>>people, so privacy was often considered paramount, but I believe the
>>techniques and state of technology with respect to statistics are
>>applicable here as well. I recommend to you pp. 34 - 40, the
>>section titled "Limiting Disclosure" in Chapter 2,
>>"Research Opportunities". The conclusion to draw from
>>their examples is that at least in the domains over which they were
>>looking, it is not well understood how to truly hide the information
>>one wants to hide. Rather than possibly misrepresenting the
>>story to you, I recommend you read it for yourselves.</div>
>><div><br></div>
>><div><x-tab>
>></x-tab><x-tab>
>></x-tab><x-tab>
>></x-tab>Karen</div>
>><div><br></div>
>><div>At 10:21 AM -0700 5/26/04, Joe Touch wrote:</div>
>><blockquote type="cite" cite>RJ Atkinson wrote:<br>
>><blockquote type="cite" cite><br>
>>On May 26, 2004, at 11:52, Joe Touch wrote:<br>
>><blockquote type="cite" cite>Impossible is the case I was referring
>>to. Certainly IF transforms are<br>
>>possible then they should be used and the data made available.
>>However,<br>
>>some data sources aren't comfortable with these transforms, since
>>there<br>
>>may be data correlation that ends up compromising the transform.<br>
>><br>
>>Notably those that correlate data to existing Internet routing tables
>>-<br>
>>if you found something that preserved not only prefixes but also
>>the<br>
>>aggregation, and published it, you'd have to publish the routing
>>tables<br>
>>similarly transformed. However, since the untransformed routing
>>tables<br>
>>are available publicly anyway, you've compromised your
>>transform.</blockquote>
>></blockquote>
>><blockquote type="cite" cite><br>
>> Whether the transform is compromised would depend
>>greatly on which<br>
>>particular routing tables one was working with.</blockquote>
>></blockquote>
>><blockquote type="cite" cite><br>
>>Yes. Bob was saying that there exists. I'm claiming there are cases
>>where there does not exist - i.e., not for all.<br>
>><br>
>>The issue is what to do with a paper published in the case I'm
>>considering; we all know what to do when you CAN safely anonymize.<br>
>><br>
>>However, note that you don't always know when it's safe - just because
>>_you_ can't correlate the info doesn't mean someone else can't.<br>
>><br>
>>Joe<br>
>><br>
>></blockquote>
>><blockquote type="cite" cite><br>
>>Content-Type: application/pgp-signature;
>>name="signature.asc"<br>
>>Content-Description: OpenPGP digital signature<br>
>>Content-Disposition: attachment;
>>filename="signature.asc"<br>
>><br>
>>Attachment converted: Macintosh HD:signature.asc (
>>/ ) (0011E338)</blockquote>
>><div><br></div>
>><div><br></div>
>><x-sigsep><pre>--
>></pre></x-sigsep>
>><div>Karen R. Sollins, Ph.D.<br>
>>Principal Research Scientist<br>
>>MIT CSAIL, The Stata Center<br>
>>32 Vassar St., 32-G818<br>
>>Cambridge, MA 02139, USA<br>
>>V: +1 617 253 6006<br>
>>F: +1 617 253 2673<br>
>>E: sollins at csail.mit.edu</div>
>></body>
>></html>
>>--============_-1126523080==_ma============--
cheers
jon
More information about the end2end-interest
mailing list