Data Mining: Controversial Convenience

images

What is Data Mining on the Internet? Increasing technology has spurred on a new information theory that hinges on the development of new, computerized search engines. David Cohen addresses this movement head on in his article Babel to Knowledge. Here, Cohen asserts that these search tools on the web “harness the power of vast electronic collections” and make overwhelming amount of data useable for the public.[1] These tools have various general capabilities such as document classification, easier for scanning, sorting and open access reference materials.

Cohen however, emphasizes their contribution for basic question answering like ‘Where to stay in Charleston, SC?’ or ‘How many oz. in 14 cups?’ He states that computer search programs are in fact most widely used  for these types of general searches and not for research. The Stanford Lab Procedure, another group of advocates for computer search programs, assessed the abilities of one particular data mining service—Docuscope. Specifically pertaining to analysis of literature, this program proved effective for classifying and defining aspects of a vast array of classic works. Ultimately, such a search tool would benefit the general public attempting to sift through these resources.

What’s the Big Deal?!

Okay, so we know now that all the searches we conduct on the Internet and articles or e-books we read on the Internet are possible because of these new Data mining computerized programs. Pretty sweet, right? The convenience of data mining is undeniable, but it is also fairly controversial as well. The realms of academia and humanities especially remain undecided on the benefits vs. controversy of using computer search programs for research purposes.

Google Books epitomizes this quagmire. For anyone who has ever used this Google tool, it is a search engine that allows you to find over 12 million books in over 300 languages in digital form.[2] Just like the other computerized search engines, there are two sides to this story. On the one hand, Google Books has achieved an enormous amount of digitization in a short time and resurrected books that have been buried and neglected for decades. Conversely however, this content is uploaded and provided in small parts to the public, oftentimes without the permission publishers. This has spurred on vast criticism by major figures in the digital and publishing world. Pat Schroeder, president of the American Association of Publishers, is one person that called Google Books “intellectual embezzlement.”[3]

So… how would you feel if your published works was mined by a site like Google and made available for the public if only in small portions?  Do we as a students and the public feel that an expansive digitized data collection is worth more than claims to copyright laws?


[1] Cohen, David. Roy Rosenzweig Center for History and New Media, “Babel into Knowledge:Data Mining Large Digital Collections.” Last modified 2006. Accessed February 24, 2013.

[2] Parry, Marc. “The Humanities Go Google.” Last modified 2010. Accessed February 24, 2013. http://marcparry.org/articleschegoogle/.

[3] Baksik, Corinna. Harvard University Library, “Fair Use or Exploitation? The Google Book Search Controversy.” Last modified 2006. Accessed February 24, 2013.

About these ads

2 thoughts on “Data Mining: Controversial Convenience

  1. At this point, I probably would not care all that much if portions of my published work were made available on places like Google Books. Why? Because Google has the potential to increase awareness of my work, and therefore, increases the possibility of someone going out and actually purchasing it. Moreover, it increases the possibility of an employer finding my publication, and then deciding to hire me because of what they discovered online through data mining. Sorry, my frustrations as a broke graduate student are causing me to answer questions from a quasi- Marxist perspective.

  2. The issue of fair use is of course a major one in the Google Books controversy. What do you think given what we have read? On the other issue, it seems that the biggest critique of the promise of data mining is whether it will indeed teach us anything new. A significant number of resources are pooled into the efforts and as of yet, no major ground breaking interpretations have emerged.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s