网站的数据被挖掘后的版权问题？

moonef · 发表于 2014-10-22 15:17:24

很想知道如果用一些开源软件挖取一些商业网站，譬如YouTube，Twitter，eBay上面价格信息，用户信息或者评论内容的话，会不会有版权纠纷？如果只是专门用来做学术研究，发paper的话呢？最近看了一些paper就是用譬如perl之类的直接从网站上抓数据下来作分析的，这样日后会有潜在的麻烦吗？

lh07 · 发表于 2014-10-22 23:01:08

Allen, G. N., Burk, D. L., and Gordon, B. D. 2006. "Academic Data Collection in Electronic Environments: Defining Acceptable Use of Internet Resources," MIS Quarterly (30:3), pp 599-610.

ABSTRACT:
Academic researchers access commercial web sites to collect research data. This research practice is likely to increase. Is this appropriate? Is this legal? Such commercial web sites are maintained to achieve business objectives; research access uses site resources for other purposes. Web site administrators may, therefore, deem academic data collection inappropriate. Is there a process to make research access more open and acceptable to web site owners and administrators? These are significant issues. This article clarifies the problems and suggests possible approaches to handle the issues with sensitivity and openness.

Research access to commercial web sites may be manual (using a standard web browser) or automated (using automated data collection agents). These approaches have different effects on web sites. Researchers using manual access tend to make a limited number of page requests because manual access is costly to perform. Researchers using automated access methods can request large numbers of pages at a low cost. Therefore, web site administrators tend to view manual access and automated access very differently.

Because of the number of accesses and the nonbusiness purpose, automated research requests for data are sometimes blocked by site administration using a variety of means (both technological and legal). This paper details the pertinent legal issues including trespass, copyright violation, and breech of contract. It also explains the nature of express and implied consent by site administration for research access.

Based on the issues presented, guidelines for researchers are proposed to reduce objections to research activities, to facilitate communication with web site administration, and to achieve express or implied consent. These include notification to web site administration of intended automated research activity, description of the research project posted as a web page, and clear identification of automated requests for web pages. In order to encourage good research practices with respect to automated data collection, suggestions are made with respect to disclosing methods used in research papers and for self regulation by academic associations.

moonef · 发表于 2014-10-23 01:07:21

lh07 发表于 2014-10-22 23:01
Allen, G. N., Burk, D. L., and Gordon, B. D. 2006. "Academic Data Collection in Electronic Environme ...

谢谢分享

zzmypster · 发表于 2014-10-23 02:14:33

这种公开数据似乎没有版权问题。伦理的话，如果没有个人identity的信息，那应该也没啥问题。

moonef · 发表于 2014-10-23 14:14:07

zzmypster 发表于 2014-10-23 02:14
这种公开数据似乎没有版权问题。伦理的话，如果没有个人identity的信息，那应该也没啥问题。 ...

参考了一些网站的条款也没发现涉及到学术范围的使用条例，大部分都是着重强调不可擅自用于商业用途，看来还是继续抓

[读书的日子] 网站的数据被挖掘后的版权问题？

所属分类: 商学院申请

正在浏览此版块的会员 ()

浏览过的版块