Ronnie Dove has started writing a book about web crawler technology for cancer research. This book dives into the technical challenges of exploring the world wide web for accurate cancer related topics in order to help patients, doctors and researchers. The book discusses the idea and challenges of sifting through massive amounts of internet data in order to find useful cancer research material. The book covers the possibilities of taking patient created blogs, messages and other data about the experience they have had with cancer in order to build a trending analysis as a second source of historical knowledge about different types of the cancer disease. The book also talks about the possibility of gathering only respectable sources of information as a portal for patients by identifying certain keywords within both the URL and metadata of each website that the web crawler explores. The book is set to be released by Amazon.com in the future.
Ronnie is also looking to release a fully functional web crawler that is able to meet the challenges that are outlines in the book. The source code for the web crawler will be made available to the public. There is already an very early prototype being used as a test.

We’re currently researching the benefits of utilizing Hadoop MapReduce for cancer related research. We’re investigating the benefits and processing power of MapReduce based algorithm enhancements in comparison to current calculation based processing methods.
The Hadoop platform was designed to solve problems where you have a lot of data — perhaps a mixture of complex and structured data — and it doesn’t fit nicely into tables. It’s for situations where you want to run analytics that are deep and computationally extensive, like clustering and targeting. That’s exactly what Google was doing when it was indexing the web and examining user behavior to improve performance algorithms.
Doveshope has teamed up with Dovestech, LLC on an important search engine and web crawler project. We will be utilizing portions of Dovestech’s search engine and web crawler technology source code to provide a frame work for our cancer research search engine in the future. We’re creating a search engine tool that allows for a user to search for cancer related documents and material from the internet using only resources that are considered to be respected within the medical community. We currently have a working prototype with a large amount of accurate data.



Recent Comments