Diving the Deep Web: The Internet You Did Not Know Existed

It is well-known that when you see an iceberg in the ocean, you are only seeing about 10% of its total mass. The remaining 90% extends below the water’s surface reaching depths of 600 feet or more. The Internet is similar to an iceberg because only 10% of it is “visible” to most users. The part of the Web that is invisible to conventional search engines and is inaccessible through traditional browsers is called the “deep Web.” Like the depths of the ocean, the deep Web is a little known world that contains dazzling and shocking creations as well as vast swaths of emptiness.

The deep Web evades the notice of the majority of traditional search engines. Even though Google may seem ubiquitous and omnipotent when running a search that produces millions of results, it is unable to reach the majority of websites online. Search engines, such as Google, Yahoo! and Bing, use Web crawler (or spider) programs that browse the Internet and copy pages that the search engine will later index and make accessible to users. The crawlers go through URLs and search for all the hyperlinks on web pages, adding them to their “crawl frontier” which maps out other websites and pages for them to index. Crawling the deep Web is a challenge for this type of software because it relies on normally shaped URLs and the presence of tags or text. Companies who operate the popular search engines have become aware of the shortcomings of crawlers and are looking for  ways to improve them.

A significant reason why crawler software has such difficulty with the deep Web is because a portion of it is composed of dynamic pages that only exist when someone types a query into a database. This format makes it challenging for a person, let alone mindless software, to discover such pages. Some users report that the majority of the deep Web is in the form of these databases; however, this is merely speculative. The popular search engines are unable to index websites because they may have scripted content and thus appear dynamically (such as with Flash), or a non-HTML (standard) format. Websites can also choose to block crawlers by requiring users to login or implement other features such as CAPTCHAs.

People who access the deep Web have reported their findings on forums on the “surface Web” like explorers returning from strange countries. Their tales generally fall into one of three categories. The first group is those people who were unimpressed with their virtual excursion and claim that the deep Web is just a 1990’s version of the surface Web. People in the second group have ended up unintentionally gazing into the dark abyss of pedophiles, assassins, crime rings, human experimentation, etc. and warn everyone away from the deep Web. The third group is those people who have found truly interesting things such as scientific papers, discussion groups, e-books, blogs, and tech communities, and want to share these with those on the surface Web.

From having made forays into the deep Web, I can say that these impressions are all fairly accurate. Hunting around for deep websites can be a frustrating experience because many of the websites are frequently boring, badly scripted or unfinished. Furthermore, slow servers cause these sites to take up to ten minutes to load, which exacerbates the feelings of frustration. There are websites on the deep Web that are truly horrific and cater to the darker elements of our society and world. However, in order to avoid such sites, all a person has to do is take hyperlink descriptions seriously and resist clicking on those that seem too disturbing to be real. With enough persistence and caution, one can find a treasure trove of academic reference material, newspaper articles, maps, data engines and more.

The companies that own the popular search engines are aware of the untapped resources that the deep Web holds and are attempting to improve Web crawlers so they can access the hidden websites. Understandably, there are websites on the deep Web that do not want to appear on the surface Web because being visible in such a way would identify their owners and users. Like shining light into the darkness of the deep-sea, the entire ecosystem of the deep Web would change if it became accessible through Google. Perhaps the world would benefit from being able to easily access information on the deep Web, but at what cost?

For an answer to this question, stay tuned for the second part of “Diving the Deep Web.”

Note: If you are curious about the deep Web, visit the Tor Project, download the browser, and see here  for a step-by-step guide.

Sources

1) Spetka, Scott. “The TkWWW Robot: Beyond Browsing.” Wayback Machine. Internet

Archive, Sept. 2004. Web. 15 Sept. 2012.

2) https://www.torproject.org/

3) http://thebotnet.com/guides-and-tutorials/49828-how-to-access-the-hidden-wiki/

Advertisements

31 thoughts on “Diving the Deep Web: The Internet You Did Not Know Existed

  1. Fascinating article! I had heard rumours of a deep web some time ago and confess it stirred my interest in the same way as hearing of uncharted regions of the deepest oceans where human divers and manned submarines cannot yet go also grabbed my interest. It always sounded like a world where your imagination could run wild thinking of what might be there.

    Sadly it sounds like as well as some really fascinating material there are also some groups there who are there for a reason, they don’t want to be found and don’t want any attention drawn to what they do. But then that’s not unlike the surface web, there is good and bad. Still, if shining a light there will reveal truly useful information, hopefully it will also root out activity that needs rooted out and banished.

    Thanks for sharing this article 🙂

    • Thank you so much! I am glad you like the article 🙂

      I felt the same way about the deep Web. The mere idea that there is something unknown, uncharted and undiscovered out there sparks the imagination. It is exciting to know that we exist in a world where there are still mysterious and places to explore – it is also interesting that we, i.e. humans, have even created “worlds” that are hidden to others.

      That is true: there are people there who have less than noble purposes for going there. There are bad things on the surface web to be sure. I am surprised that some stuff is even on the surface web and hasn’t been taken down.

      In an ideal world, shining a light there would reveal information, banish the horrific things that are there and preserve the essence of what the deep Web is.

      Thank YOU so much for taking the time to write such a thoughtful and insightful comment! I am glad to have found someone who is also an adventurer at heart. 🙂

      • Your words about humans creating “worlds” that are hidden to others struck me. I guess my instinct is to be suspicious, thinking that if a world is hidden then there are reasons, and not good ones, why its hidden. Then I thought its so curious that such a hidden world can contain some very good things as well and that in itself sparks the curiosity. What kind of things could be hidden whilst still being good and for what reason? (I feel a science fiction book coming on…)

        I agree, its actually quite baffling why some surface web material hasn’t been removed. I think I’m right in thinking the internet is supposed to be ‘self-governing’ (could be wrong) but I’m sure self government should cover less than appropriate material.

        Never thought of myself as an adventurer! But then I have always liked to ‘think’ at the boundaries of what I know, reach the edge of the known and look beyond. Not always fruitfully but the journey can be immensely intellectually stimulating.

        One other thought I had about your blog, you make something that could be quite difficult to understand, easy to get to grips with. Many try and fail to make the complex easier to grasp but you succeeded, a rare talent indeed. 🙂

        • It does seem that more often than not, at least in the physical world, people create hidden worlds or networks because they are doing something nefarious. But there are examples of people creating secret societies just to feel like they are doing something special. Also, sometimes people unintentionally create hidden worlds – take academia for example. The “hidden” factor may arise out of the complexity or esoteric nature of the work. A lot of university databases are accessible through the deep Web and are only hidden because they are databases that do not respond to traditional search engines. However, because of the “charged” nature of the content of some hidden websites, it does tend to cast a shadow over all of the others – at least that has been my experience. Whenever I am waiting for a deep website to load, even if I am pretty sure I am just going to see an article or a paper, I do get anxious about what I could possibly see (and then never un-see).

          True, the surface web is self-governing for the most part. There are some child protection acts or generally laws about selling drugs or weapons that give the federal government the ability to take down websites and prosecute owners.

          Pushing the boundaries of thought! Possibly one of the most rewarding types of adventure in our current age!

          That is very high praise! Thank you 🙂 I definitely got to the point where I was stressing about the fact I was writing about how software works and how “out there” that could be. I am glad it was clear! Thank you so much for the validation.

  2. Just replied to your comment on my own page so as I’m passing I’d like to ask a question if I may? Your article on the deep web has ‘kind of’ confirmed something I heard/read some time ago. What I remember is that the internet is a vast cyber world and that the world wide web is only part of it. Is that right? I always thought that the www WAS the internet, that the two were one and the same but what I read led me to think the www operated within the internet but was only one part of the internet. Sorry, not explaining this very well, you are way better at simplifying ‘tech stuff’ than me! I remembered this after reading your article that the deep web is like another world ‘within’ the internet that goes beyond what can be ‘easily’ accessed via traditional methods. Basically I think I’m asking if the www operates within the internet and is a part of the internet, but isn’t synonymous with the internet. Sorry…feel free to have a coffee before any reply…

    • Thanks for the question! It is a really good one! You are right: the world wide web and the Internet, although the terms are often used interchangeably, are not the same thing. The Internet is the massive networks of networks and infrastructure that connect computers all over the world. The world wide web is a way of accessing and sharing information online and it operates through the HTTP protocol. There are several “languages” in which information is communicated, which are called protocols. The www uses browsers that are designed to access pages that are hyperlinked. The www is a popular portion of the internet and I think could be described as “the surface web” because it is entirely accessible through traditional search engines.

      I hope that answered your question!

      • You have answered my question and have made it clearer to me thanks. I have always been somewhat confused by the distinction and had always assumed the two were synonymous.

        I confess the deep web has me interested and I’m curious as to what might ‘live there’ that would be of interest in my own fields. Strange thing is that because the deep web is ‘hidden’, its seem very secretive and even thinking about looking into it makes me feel like I’m thinking of doing something illegal! I guess that’s because its so unknown and, almost but not quite, gives the impression of being ‘off the grid’!

        Wanted to repeat as well what I said in a previous comment, you clearly have a gift for rendering topics which can be complex and prohibitively technical, more understandable to those who would find it difficult. I have tried to read a few popular science books on various subjects where the author is said to make tough to grasp topics easier for the ‘normal’ person to understand. I admit reading some of the ‘easier to understand’ bits still left me clueless!

        I look forward to your first popular science book!

        Thanks again for your explanation.

        • Excellent! I am glad the answer made sense!

          I can sympathize with that feeling. Every once and awhile I get the panicky feeling about whether someone is going to burst into my house because I am on the deep Web. It is totally irrational and completely paranoid.

          Thank you so much. That really touches me. I really respect your writing skills which you demonstrate in your longer posts. You know how to capture experiences and feelings which I always thought were beyond words. It is my highest goal to make the more technical stuff accessible and interesting. I have had that feeling myself with some of the books out there. There are some pop-psychology books where I have actually had to turn to textbooks to explain the “normal person” explanations! Your words are making me think more seriously about directing my non-fiction writing towards explaining technical topics… I will keep you posted on any developments in that area 🙂

          Thank you again. I will definitely be turning to your comments when I have moments of doubt. They are truly a gift as is having virtually met you 🙂

          • I have an answer ready if the secret police break down the door as I type ‘deep web’, I’m gonna say if they’d held back a few seconds they’d see I was in the middle of typing ‘deep webinar’ as I wanted to hear a talk on the deep web without visiting it. If I hear you’ve gone to prison for deep web surfing, I’ll post bail for you 🙂

            Thanks for your kinds words re my own writing. I’m a work in progress but its nice to hear kind words. Sounds like you have the ability for success whatever you do. I think whatever your expertise, as long as you enjoy what you’re doing then your mind will open up pathways for you and opportunities will come. Explaining tech topics would be a service to many people.

            Glad to have virtually met you too, I’ll get you a coffee as I’m having one myself but will have to drink it for you 🙂

          • That is a good plan and thank you for the bail offer 🙂 Hopefully it will never come to that. I would definitely use your idea first. Thank you for imbibing coffee for me! I am sure it contributed to my overall productivity!

  3. @Rebecca – Love the topic, I myself having spent ours on Google, to perform site checks of my own website, and submitting “sitemaps” so Google, can find my pages correctly, can understand the frustration. Most of the time Google finds missing or unknown pages, especially when I have edited a post too many times.

    Used to do a “site maintenance check” on Google constantly, now I have abandoned doing so, as it takes way too much time.

    Have been into the “deep web” and I was in the group of disliking, due to way too much really just what I refer to as “illegal porn.”

    Just really didn’t deem going into the deep waters (as you described) worth the awful sites, that pop up at times.

    Do remember one occasion of such an instance where I was assisting my daughter with a paper for school, and the most mortifying website just popped up out of nowhere in the deep web. Wish my daughter had never seen such things, as my search wasn’t even closely related to the topic.

    However, often do searches there for academic papers, as love to cite academic papers as sources for reports, as a nice change from wiki, government sites, as the college reports from professors seem to have a stated opinion of the subject, and are well written as the prof is of a higher education.

    Love this writing style Rebecca, found the article “flowed” nicely, also enjoyed the shorter post, with two parts, makes for easier reading (to me).

    Reminds me of how the FBI, police forces (specializing in internet crime) and the like often dive into the deep web (as I have seen on documentaries and read about) although unsure of this to be stated as fact, due to the FBI, et al, keeping such things top secret in the United States, as such the above statement is written as a theory, and not as a fact of knowledge regarding the FBI or police.

    Wonderful topic, interesting and great article.

    • Google does have some issues with updating quickly. It has sometimes been weeks before it gets itself together to update my articles. It is a pain to have to monitor Google and see how it is dealing with one’s site.

      I had no idea you had done some deep web exploring! That is exciting, but really unfortunate that you came across that type of illegal pornography which can make you pretty depressed about what people are capable of.

      I can understand the fact that it forever turned you off of the deep web. If I ever stumbled across those sites, I probably would never go back. The only reason I have been so lucky is because I knew about that potential going in. However, it is difficult to avoid all disturbing things.

      Oh yikes. That is horrible. I am so sorry for you and your daughter. That must have been awful.

      It does have great research material! It does get tiring going through government websites or wikipedia where I sometimes wonder what is accurate and what is an exaggeration or just wrong.

      I have also heard about how the FBI patrols the deep web. I have heard that if you go in without some sort of proxy to hide your IP, the police will “follow” you a round to see what you are doing. Of course, this is not a confirmed theory. It is only something I have heard repeatedly on various forums. I do recall reading one story where the FBI “sat” on an exit for the Tor network and traced back to find thousands of IP addresses and see who was downloading copyrighted material. Scary stuff, but kind of cool!

      I am so glad you liked the article and the writing style! 🙂 I am going to try to break up topics more so I can say more and post more frequently.

      • Funny you mention that Rebecca, reminds me of a post I did long time ago, regarding Facebook and Human Trafficking. Noted getting stats and referrals from a government site, could tell it was the FBI that had obviously picked up my post (not confirmed) but it was obvious to me considering the referral information, where it says “referrers” in the stats.

        The site didn’t make me what to leave the deep web forever, just due to as you stated wonderful cite sources, prof’s papers I find intriguing, as they sometimes just do them for fun and not for a requirement of any type.

        This may sound odd, but a viable question on the subject:

        Rebecca, noted that my iPhone picks up more of the deep web than my laptop, do you know why this might be? Is it due to my iPhone using Safari as a the search engine?

        Seems odd to me….almost like Safari picks up so much more of deep web than Google or Bing.

        Bing is a horrible search engine, in my opinion and never use it.

        Most of my cited sources are found from the iPhone which seems to pick up so much more, especially from the iUniversity app, that is now free and has a ton of prof papers in free PDF’s. Love that app, the iPhone really is wonderful with that new iUniversity app, and other quote apps, research apps that go into the deep web, and make it easier.

        The iPhone with research apps, seems to have made diving a bit easier.

        • That is such a good question! There are several reasons why your iPhone can pick up more deep Web sites than your laptop.

          1. Your laptop cache/web history is probably large and contains a lot of tracking information that is trying to help you find websites that it thinks you would like. If you have searched a lot for websites unrelated to your research, this may prevent your browser from turning up what will now be relevant for you. I would try clearing your cache and running the same search both on your phone and on your laptop and seeing if they are still different.

          2. Bing sucks! (I have irrational rage towards it) It is very ad-driven and uses some sort of software to track your searching habits and suggests things it thinks you will find relevant, regardless of what you are now searching for. I don’t like using it. I would be interested to see what happens if you use Safari on your laptop (if you have mac, that is or even someone else’s), clear the cache on the laptop and your phone and run the same search.

          3. This point is related to point #2. Safari is special because it is much better at reading JavaScript than other browsers. Many deep Web sites are in JavaScript so, naturally, Safari will be better at turning up these sites than Internet Explorer (which uses Bing) or Chrome.

          4. If your iPhone and laptop were both using Safari and were still turning up different results after clearing your caches, it could be that your phone and laptop have different IP addresses. If your phone were using AT&T’s network, your phone probably would not have its own IP address, but one that your carrier provides. This enables ISPs to determine how much data you are accessing on your phone. This makes sense because you would not want your home’s Internet access being confused with your phone’s. If you did not have an unlimited data plan, your bill would be enormous at the end of the month.

          5. Depending on the network your phone accesses to use the Internet, your phone’s search results may differ from your laptop’s because some sites don’t allow access from certain networks.

          6. I haven’t heard of the iUniversity App, but I would love to look into it if you have any information/ a link to it. It could be that instead of looking for popular tags or text, it generates queries to deep Web databases.

          So the FBI was most likely using your post as a resource? That is pretty cool!

          Thank you so much for the great question! 🙂 The difference between your laptop and phone made me very curious!

          • Oh Rebecca, I am so sorry, gave you the wrong app name….Guess it was filed in my brain under the completely wrong name. This is a fairly newer app by Apple that I love. Here is the link:

            http://itunes.apple.com/us/app/itunes-u/id490217893?mt=8

            I don’t know if you have an iPhone, but in love with mine, because even though I have the iPhone 4, apple sends me the software for the newest version of the iPhone software automatically and for free.

            Love the “find my iPhone” free app too, as it geographically locates my phone on a map .

            I am also able to locate our children at any time of day, via google satellite as long as iPhone is on and they have it with them.

            The find my iphone app actually allows me to see their whereabouts without them knowing, and if the phone is stolen, I can remote wipe it and also make it do a super loud siren noise (this is free with the find my iPhone app).

            My iPhone with AT&T does have an IP Address, the other awesome thing about it is that only with AT&T, my friends can legally tether to my phone and use it as a wi-fi hot spot (this is free with my highest data plan).

            For example: A group is at a basketball game, I just simply turn on “wi-fi personal hot spot” on my phone and everyone can use my wi-fi.

            We do have the highest data plan, I do not use the wi-fi at home for the phone, because like using 3G, runs so fast on the iPhone like a dream.

            Also love playing the games with other iPhone users across the world like checkers, Pictionary, etc. that is really fun while waiting on a plain, bus, etc. Kids just love that feature too.

            Safari seems to be a wonderful search engine, and does work the same on a laptop (my mom uses it). Bing is the worst I have ever used – agree with you.

            The wonderful iPhone apps have made researching great, especially the free Podcasts etc. and iBooks app with the free ibooks.

            I did hear that Apple helped with “drone” planes (spy planes) technology for the US Government, unsure if this is true or not. My husband told me this, not sure where he read it or seen the information.

            Spy technology has always been an interest of mine.

            I really enjoyed this post, and like the opinionated Rebecca!

  4. Great article as per usual. What is funny…and a little strange is that I posted content earlier this week on search engine’s biasing research that also had a focus on the deep web. The ocean metaphor is there too. You chose icebergs and I chose sunny beaches and benthic fish.

    Again you did a really great job of pulling this topic together. I have been interested in demystifying the deep web for a long time (I am in the third group). It is hard to crack, intelligence agencies would prefer you to not be down there, and once again the media picks up on the gritty parts only. All aspects that beckon to fringe explorers….like a siren song.

    One of the coolest parts about the Deep Web is that you can often pull up resources that will be so unexpected as to oblige you to “reframe” your position on whatever it is you are researching. Kind of like being knocked to the ground only to find a $100.00 bill. All of a sudden there a lot more options open to you.

    Thanks for writing.

    • Wow! Maybe we were in touch with the same writing muse! I am going to read your post!

      Thank you so much for your kind words! I am glad that it came together and was enjoyable to read. I am in the third group too. The deep web is fascinating and it does have a certain siren song. I am thinking of building a server this weekend so I can be be down there more safely. I will let you know if I come across anything worthy of sharing.

      Yes! I know that feeling. It is the sort of place that encourages radical paradigm shifts. The people down there are interesting too. I’m not referring to the more criminally inclined, but the “explorers” you can find on forums.

      Thank you so much for reading and also sharing your experiences with the deep web. It is exciting to discover so many people who are also on the fringe 🙂

  5. Sorry Rebecca, think I was overly sleep when I wrote the last comment regarding the Safari search engine.

    Safari on phone – brings up more search results of the deep web (that I can locate quickly).
    Safari on my mom’s computer – brings up the same search results (but takes me much longer to locate)

    I Should have been a bit more clear about the Safari search engine iphone vs. laptop in regards to the deep web.

    • No worries! I am sorry for the late reply! I had a pretty major deadline for work yesterday and that sucked up all my time.

      I don’t have an iPhone, though I really wish I did. My roommate has one and I am really jealous. I am waiting for my Blackberry to expire before getting an iPhone. That hot spot feature sounds amazing. I wonder how it works… I will look into it.

      The iTunes U sounds like an amazing app. I guess Apple has a deal with certain schools and universities. It probably connects you to the universities’ databases.

      I guess we have our answer about why your phone and laptop give you different search results! 🙂 I was seriously puzzling about this for a couple days. I think it must be because Safari is able to read more scripts than Chrome or IE or Firefox. It probably takes longer on your mom’s computer because you have to sift through the results generated by her browsing history. Thank you for giving me a fun research project!

      I too am interested in spy technology and also spy networks. This is probably a dangerous interest, but I am thinking of researching how counter intelligence people use the Internet for espionage.

      I am glad you like the opinions! I think you are rubbing off on me! 🙂

I would love to hear your thoughts, ideas and questions! I will make sure to visit your blog (if you have one) and make substantive comments on your posts. Thank you for reading!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s