What is Data Mining on the Internet? Increasing technology has spurred on a new information theory that hinges on the development of new, computerized search engines. David Cohen addresses this movement head on in his article Babel to Knowledge. Here, Cohen asserts that these search tools on the web “harness the power of vast electronic collections” and make overwhelming amount of data useable for the public. These tools have various general capabilities such as document classification, easier for scanning, sorting and open access reference materials.
Cohen however, emphasizes their contribution for basic question answering like ‘Where to stay in Charleston, SC?’ or ‘How many oz. in 14 cups?’ He states that computer search programs are in fact most widely used for these types of general searches and not for research. The Stanford Lab Procedure, another group of advocates for computer search programs, assessed the abilities of one particular data mining service—Docuscope. Specifically pertaining to analysis of literature, this program proved effective for classifying and defining aspects of a vast array of classic works. Ultimately, such a search tool would benefit the general public attempting to sift through these resources.
What’s the Big Deal?!
Okay, so we know now that all the searches we conduct on the Internet and articles or e-books we read on the Internet are possible because of these new Data mining computerized programs. Pretty sweet, right? The convenience of data mining is undeniable, but it is also fairly controversial as well. The realms of academia and humanities especially remain undecided on the benefits vs. controversy of using computer search programs for research purposes.
Google Books epitomizes this quagmire. For anyone who has ever used this Google tool, it is a search engine that allows you to find over 12 million books in over 300 languages in digital form. Just like the other computerized search engines, there are two sides to this story. On the one hand, Google Books has achieved an enormous amount of digitization in a short time and resurrected books that have been buried and neglected for decades. Conversely however, this content is uploaded and provided in small parts to the public, oftentimes without the permission publishers. This has spurred on vast criticism by major figures in the digital and publishing world. Pat Schroeder, president of the American Association of Publishers, is one person that called Google Books “intellectual embezzlement.”
So… how would you feel if your published works was mined by a site like Google and made available for the public if only in small portions? Do we as a students and the public feel that an expansive digitized data collection is worth more than claims to copyright laws?