Dropbox is a well-known online service which allows you to share files between computers. If, in the past few months, new outsiders came on the same market, Dropbox remains the number one. If files are synchronized between Dropbox software clients, they also provide features to share files with third-party who don’t have a Dropbox account. How? By creating “links” to those files. It’s easy: In your Dropbox folder, select a file, right click and select “Dropbox -> Get Link“. Your direct URL will look like this: “http://www.dropbox.com/s/wg0ih0qywujn77y/myfile.zip“. Then, share the URL with your peers who just have to point their browser to it to access your file. Easy!
But if your files are available via HTTP(S), this means that anybody can access them. We just have to guess valid URLs. Guessing the 15-characters strings is doable (brute-force) but will require a lot waste of time. Where can we find plenty of existing URLs? In search engines of course!
I wrote a Google crawler and let it run during approximatively ten days. It was not easy. If Google is a champion to grab our data, they don’t allow extensive use of their search engine! You are often blacklisted and have to fill a CAPTCHA. They present you a “sorry page” to prove you are not a bot:
But some techniques can be implemented to evade their tests:
- Search across multiple TLD’s (easy, they have all of them 😉
- Change your User-Agent string randomly
- Use open-proxies randomly
- Do NOT use Tor, they blacklist the exit-nodes
- Use other anonymizing services (like anonymouse.org)
- Add random sleep() between queries
My crawler searched for pages containing “http[s]://[dl|www].dropbox/s/*“. For every hit returned by Google, the corresponding URL was also visited to parse and extract the Dropbox shared links. Finally, all found URLs were visited (500.000+ pages were processed) and data downloaded. Of course, a lot of them provided the same content or same links (example: all conversations in forums, mailing-lists archives).
Interesting to mention, when I downloaded all the files in batch from Dropbox, I did not implement special techniques like the ones to search on Google. And I was never blacklisted! I’m just wondering if Dropbox have controls in place? Did they see my traffic?
All the files were reviewed and here are some findings. Let’s start with some statistics:
- 2240 unique Dropbox URLs were found
- 1762 files were downloaded (HTTP 200)
- 116 requests returned an HTTP 403 error
- 332 requests returned an HTTP 404 error
- 45.57 GBytes was downloaded
- The biggest file was 2.09GB (a RAR archive with WAV files).
- Average file size: 26.32MB
A “403” error corresponds to a bad file name (ex: typo error in the URL). A “404” means that the file was removed by the Dropbox user. Here we can already make a conclusion/recommendations. When users share files with open links, they often don’t remove it once the file has been downloaded by the third parties. For me, shared links are temporary links! Dropbox allows to “cancel” a shared link without deleting the file.
What are the most shared file types?
File Type | Found |
data | 1088 |
Zip archive data | 383 |
JPEG image data | 354 |
ZIP archive data, at least v2.0 to extract | 295 |
JPEG image data, EXIF standard | 167 |
JPEG image data, JFIF standard 1.01 | 140 |
RAR archive data, v1d, os: Win32 | 86 |
ZIP archive data, at least v1.0 to extract | 83 |
PDF document, version 1.5 | 71 |
PDF document, version 1.3 | 63 |
PDF document, version 1.4 | 62 |
ISO Media | 60 |
JPEG image data, JFIF standard 1.02 | 45 |
JPEG image data, EXIF standard 2.2 | 44 |
Audio file with ID3 version 2.3.0 | 41 |
ASCII text | 41 |
PE32 executable (GUI) Intel 80386, for MS Windows | 36 |
Microsoft Word 2007+ | 30 |
Microsoft Excel 2007+ | 22 |
JPEG image data, EXIF standard 2.21 | 18 |
What were the most obscure file type? Just two examples:
- A Fortran source code
- A x86 boot sector
Some filenames were explicit and attracted my attention immediately (like “Report-04-2012.xls“). By doing this exercises, you immediately understand why social engineering attacks are so successful and why people suffer of “clickmania“. It’s really tempting to open such files!
First, the pictures. I was surprised: only one picture was pornographic material. Lot of screenshots and error messages were found. I also saw a lot of pictures of good for sale and, a classic, network schema’s! 50% of the pictures were took using smartphones and contained of course interesting EXIF data (GPS coordinates).
The office documents were also a good source of findings. To briefly resume, I found:
- A list of weapons (!) for sale (pictures, prices, stocks)
- Political documents (propaganda)
- Resumes (with all private details of course)
- Employees lists
- MDB files (Microsoft Access)
- Business plans
- Attorneys documents (infringements reports)
- Meeting minutes
- Manuals (cars, mobile phones, tools)
- Student thesis
The best one was for sure a complete scan of a real-estate contract completed with all details:
Of course, I scanned the files with an anti-virus (ClamAV). On the 56 executable files found, only 6 were infected with Trojans (10.71%). I also found a lot of Android application packages (*.apk) files. I did not extract meta-data from those Office files but I’m sure I could find interesting stuff too.
Another interesting finding? Developers also enjoy the Dropbox sharing feature. I found lot of source code (HTML, JavaScript, XML, PHP). It’s easy to develop and share your source code, no need to upload your source files, just share them and include them in your applications. However, when you download the file directly, the source code is disclosed. Example: https://www.dropbox.com/s/388v3j55z4210e1/test.php.
What can we conclude from this small analysis? Dropbox links do not reveal who shared the file. There is no way to find back the account owner, except if personal information are disclosed in the shared file. And… they are! Shared files are difficult to exploit to collect information about a target (during the reconnaissance phase of a coming attack). Anyway, keep in mind that shared files can be read by anybody! This feature must be used with due care and attention. If you really need to share sensitive data, encrypt them! Which is always good when sending files into the Dropbox cloud…
My goal was not to target a specific user by crawling all his/her files. As said in the last paragraph, you cannot link the URL’s to a particular user.
Dropbox is pretty secure. Crawling urls using google is a well known practice.
Those “s” urls must be manually added by a user (and can immediately be removed). Scanning ‘sequantial’ urls is like typing random “goo.gl” urls, you are not getting the files of a single users but random files from random users.
Same thing happens with t.co and any other shortening service.
The files in the public folder, instead, are NOT public and it’s even much more easy to manage. Only knwing the exact file name or folder name can allow you to see the file.
This post is pointless IMHO.
You are right! Typo? A fixed this!
No, I didn’t. Of course, it could be interesting to find a relation between the account and the generated string! 🙂 But it should be generated randomly! I’ll try to find some time to check this!
Did you perform any analysis on the 15 character string?
To clarify, does it means that
– only files in dropbox that the owner had done a “Dropbox -> Get Link“ or
– only files store in the Public folder or
– only files store in Shared Folder or
– all files in dropbox is opened to this type of searches
For your info , had post a similar thread in Dropbox forum (http://forums.dropbox.com/topic.php?id=60856&replies=1#post-435149). Hope that Dropbox personnel can clarify.
Interesting article!
Just one little point though, 6 out of 56 is not 3%? Or did I miss something?