Wikipedia Database Download
Wikipedia Database Download
English-language Wikipedia
Dumps from any Wikimedia Foundation project: dumps.wikimedia.org (https://dumps.wikime
dia.org/) and the Internet Archive
English Wikipedia dumps in SQL and XML: dumps.wikimedia.org/enwiki/ (https://dumps.wiki
media.org/enwiki/) and the Internet Archive (https://archive.org/search.php?query=subject%
3A%22enwiki%22%20AND%20subject%3A%22data%20dumps%22%20AND%20collectio
n%3A%22wikimediadownloads%22)
Download (https://meta.wikimedia.org/wiki/Data_dump_torrents#English_Wikipedia) the
data dump using a BitTorrent client (torrenting has many benefits and reduces server
load, saving bandwidth costs).
pages-articles-multistream.xml.bz2 – Current revisions only, no talk or user pages; this is
probably what you want, and is over 25 GB compressed (expands to over 105 GB when
decompressed). Note that it is not necessary to decompress the multistream dumps in
the majority of cases.
pages-meta-current.xml.bz2 – Current revisions only, all pages (including talk)
abstract.xml.gz – page abstracts
all-titles-in-ns0.gz – Article titles only (with redirects)
SQL files for the pages and links are also available
All revisions, all pages: These files expand to multiple terabytes of text. Please only
download these if you know you can cope with this quantity of data. Go to Latest
Dumps (https://dumps.wikimedia.org/enwiki/latest/) and look out for all the files that have
'pages-meta-history' in their name.
To download a subset of the database in XML format, such as a specific category or a list of
articles see: Special:Export, usage of which is described at Help:Export.
Wiki front-end software: MediaWiki [1] (https://www.mediawiki.org).
Database backend software: MySQL.
Image dumps: See below.
NOTE THAT the multistream dump file contains multiple bz2 'streams' (bz2 header, body, footer)
concatenated together into one file, in contrast to the vanilla file which contains one stream. Each
separate 'stream' (or really, file) in the multistream dump contains 100 pages, except possibly the last one.
Cut a small part out of the archive with dd using the byte offset as found in the index. You could then
either bzip2 decompress it or use bzip2recover, and search the first file for the article ID.
Other languages
In the dumps.wikimedia.org (https://dumps.wikimedia.org/) directory you will find the latest SQL and
XML dumps for the projects, not just English. The sub-directories are named for the language code and
the appropriate project. Some other directories (e.g. simple, nostalgia) exist, with the same structure.
These dumps are also available from the Internet Archive.
Unlike most article text, images are not necessarily licensed under the GFDL & CC-BY-SA-4.0. They
may be under one of many free licenses, in the public domain, believed to be fair use, or even copyright
infringements (which should be deleted). In particular, use of fair use images outside the context of
Wikipedia or similar works may be illegal. Images under most licenses require a credit, and possibly
other attached copyright information. This information is included in image description pages, which are
part of the text dumps available from dumps.wikimedia.org (https://dumps.wikimedia.org/). In
conclusion, download these images at your own risk (Legal (https://dumps.wikimedia.org/legal.html)).
Windows
Beginning with Windows XP, a basic decompression program enables decompression of zip files.[2][3]
Among others, the following can be used to decompress bzip2 files.
Macintosh (Mac)
GNU/Linux
Some BSD systems ship with the command-line bzip2 tool as part of the operating system.
Others, such as OpenBSD, provide it as a package which must first be installed.
Notes
1. Some older versions of bzip2 may not be able to handle files larger than 2 GB, so make
sure you have the latest version if you experience any problems.
2. Some older archives are compressed with gzip, which is compatible with PKZIP (the most
common Windows format).
Before starting a download of a large file, check the storage device to ensure its file system can support
files of such a large size, check the amount of free space to ensure that it can hold the downloaded file,
and make sure the device(s) you'll use the storage with are able to read your chosen file system.
Windows
FAT16 supports files up to 4 GB. FAT16 is the factory format of smaller USB drives and all
SD cards that are 2 GB or smaller.
FAT32 supports files up to 4 GB. FAT32 is the factory format of larger USB drives and all
SDHC cards that are 4 GB or larger.
exFAT supports files up to 127 PB. exFAT is the factory format of all SDXC cards, but is
incompatible with most flavors of UNIX due to licensing problems.
NTFS supports files up to 16 TB. NTFS is the default file system for modern Windows
computers, including Windows 2000, Windows XP, and all their successors to date. Versions
after Windows 8 can support larger files if the file system is formatted with a larger cluster
size.
ReFS supports files up to 16 EB.
Macintosh (Mac)
HFS Plus (HFS+) (Also known as Mac OS Extended) supports files up to 8 EiB (8
exbibytes) (2^63 bytes).[4] An exbibyte is similar to an exabyte. HFS Plus is supported on
macOS 10.2+ and iOS. It was the default file system for macOS computers prior to the
release of macOS High Sierra in 2017 when it was replaced as default with Apple File
System, APFS.
APFS supports files up to 8 exbibytes (2^63 bytes).[4]
Linux
ext2 and ext3 supports files up to 16 GB, but up to 2 TB with larger block sizes. See
https://users.suse.com/~aj/linux_lfs.html for more information.
ext4 supports files up to 16 TB, using 4 KB block size. (limit removed in e2fsprogs-1.42
(2012) (https://fedoraproject.org/wiki/Features/F17Ext4Above16T))
XFS supports files up to 8 EB.
ReiserFS supports files up to 1 EB, 8 TB on 32-bit systems.
JFS supports files up to 4 PB.
Btrfs supports files up to 16 EB.
NILFS supports files up to 8 EB.
YAFFS2 supports files up to 2 GB
FreeBSD
Windows
Linux
32-bit kernel 2.4.x systems have a 2 TB limit for all file systems.
64-bit kernel 2.4.x systems have an 8 EB limit for all file systems.
32-bit kernel 2.6.x systems without option CONFIG_LBD have a 2 TB limit for all file
systems.
32-bit kernel 2.6.x systems with option CONFIG_LBD and all 64-bit kernel 2.6.x systems
have an 8 ZB limit for all file systems.[5]
Android: Android is based on Linux, which determines its base limits.
Internal storage:
All devices support HFS Plus (HFS+) for internal storage. No devices have external storage
slots. Devices on 10.3 or later run Apple File System supporting a max file size of 8 EB.
Tips
Also, if you want to get all the data, you'll probably want to transfer it in the most efficient way that's
possible. The wikipedia.org servers need to do quite a bit of work to convert the wikicode into HTML.
That's time consuming both for you and for the wikipedia.org servers, so simply spidering all pages is not
the way to go.
To access any article in XML, one at a time, access Special:Export/Title of the article.
Please be aware that live mirrors of Wikipedia that are dynamically loaded from the Wikimedia servers
are prohibited. Please see Wikipedia:Mirrors and forks.
If you want information on how to get our content more efficiently, we offer a variety of
methods, including weekly database dumps which you can load into MySQL and crawl
locally at any rate you find convenient. Tools are also available which will do that for you
as often as you like once you have the infrastructure in place.
Instead of an email reply you may prefer to visit #mediawiki connect (https://web.libera.chat/?c
hannel=#mediawiki) at irc.libera.chat to discuss your options with our team.
Database schema
SQL schema
See also: mw:Manual:Database layout
The sql file used to initialize a MediaWiki database can be found here (https://phabricator.wikimedia.org/
source/mediawiki/browse/master/sql/mysql/tables-generated.sql).
XML schema
The XML schema for each dump is defined at the top of the file and described in the MediaWiki export
help page.
If you want to draft a traditional website in Mediawiki and dump it to HTML format, you might
want to try mw2html (https://barnesc.blogspot.com/2005/10/mw2html-export-mediawiki-to-st
atic.html) by User:Connelly.
If you'd like to help develop dump-to-static HTML tools, please drop us a note on the
developers' mailing list.
Static HTML dumps as of 2008 are available here (https://dumps.wikimedia.org/other/static_
html_dumps/).
See also:
mw:Alternative parsers lists some other not working options for getting static HTML dumps
Wikipedia:Snapshots
Wikipedia:TomeRaider database
Kiwix
Kiwix is by far the largest offline distribution of Wikipedia to
date. As an offline reader, Kiwix works with a library of
contents that are zim files: you can pick & choose whichever
Wikimedia project (Wikipedia in any language, Wiktionary,
Wikisource, etc.), as well as TED Talks, PhET Interactive
Maths & Physics simulations, Project Gutenberg, etc.
It is free and open source, and currently available for Kiwix on an Android tablet
download on:
Android (https://play.google.com/store/apps/details?id=org.kiwix.kiwixmobile)
iOS (https://itunes.apple.com/us/app/kiwix/id997079563?mt=8)
macOS (https://apps.apple.com/us/app/kiwix-desktop/id1275066656)
Windows (https://download.kiwix.org/release/kiwix-desktop/kiwix-desktop_windows_x64.zip)
& Windows 10 (UWP) (https://www.microsoft.com/store/apps/9P8SLZ4J979J)
GNU/Linux (https://flathub.org/apps/details/org.kiwix.desktop)
... as well as extensions for Chrome (https://chrome.google.com/webstore/detail/kiwix/donaljnlmapmnga
koipdmehbfcioahhk) & Firefox (https://addons.mozilla.org/fr/firefox/addon/kiwix-offline/) browsers,
server solutions, etc. See official Website (https://www.kiwix.org/en/) for the complete Kiwix portfolio.
Old dumps
The static version of Wikipedia created by Wikimedia: http://static.wikipedia.org/ Feb. 11,
2013 – This is apparently offline now. There was no content.
Wiki2static (http://www.tommasoconforti.com/) (site down as of October 2005) was an
experimental program set up by User:Alfio to generate html dumps, inclusive of images,
search function and alphabetical index. At the linked site experimental dumps and the script
itself can be downloaded. As an example it was used to generate these copies of English
WikiPedia 24 April 04 (http://fixedreference.org/en/20040424/wikipedia/Main_Page), Simple
WikiPedia 1 May 04 (https://web.archive.org/web/20040618150011/http://fixedreference.org/
simple/20040501/wikipedia/Main_Page)(old database) format and English WikiPedia 24
July 04 (http://july.fixedreference.org/en/20040724/wikipedia/Main_Page)Simple WikiPedia
24 July 04 (http://july.fixedreference.org/simple/20040724/wikipedia/Main_Page), WikiPedia
Francais 27 Juillet 2004 (http://july.fixedreference.org/fr/20040727/wikipedia/Accueil) (new
format). BozMo uses a version to generate periodic static copies at fixed reference (http://fix
edreference.org/) (site down as of October 2017).
XOWA
XOWA is a free, open-source application that helps download Wikipedia to a computer. Access all of
Wikipedia offline, without an internet connection! It is currently in the beta stage of development, but is
functional. It is available for download here (http://xowa.org/home/wiki/Help/Download_XOWA.html).
Features
Displays all articles from Wikipedia without an internet connection.
Download a complete, recent copy of English Wikipedia.
Display 5.2+ million articles in full HTML formatting.
Show images within an article. Access 3.7+ million images using the offline image
databases.
Works with any Wikimedia wiki, including Wikipedia, Wiktionary, Wikisource, Wikiquote,
Wikivoyage (also some non-wmf dumps)
Works with any non-English language wiki such as French Wikipedia, German Wikisource,
Dutch Wikivoyage, etc.
Works with other specialized wikis such as Wikidata, Wikimedia Commons, Wikispecies, or
any other MediaWiki generated dump
Set up over 660+ other wikis including:
English Wiktionary
English Wikisource
English Wikiquote
English Wikivoyage
Non-English wikis, such as French Wiktionary, German Wikisource, Dutch Wikivoyage
Wikidata
Wikimedia Commons
Wikispecies
... and many more!
Update your wiki whenever you want, using Wikimedia's database backups.
Navigate between offline wikis. Click on "Look up this word in Wiktionary" and instantly view
the page in Wiktionary.
Edit articles to remove vandalism or errors.
Install to a flash memory card for portability to other machines.
Run on Windows, Linux and Mac OS X.
View the HTML for any wiki page.
Search for any page by title using a Wikipedia-like Search box.
Browse pages by alphabetical order using Special:AllPages.
Find a word on a page.
Access a history of viewed pages.
Bookmark your favorite pages.
Downloads images and other files on demand (when connected to the internet)
Sets up Simple Wikipedia in less than 5 minutes
Can be customized at many levels: from keyboard shortcuts to HTML layouts to internal
options
Main features
1. Very fast searching
2. Keyword (actually, title words) based searching
3. Search produces multiple possible articles: you can choose amongst them
4. LaTeX based rendering for mathematical formulae
5. Minimal space requirements: the original .bz2 file plus the index
6. Very fast installation (a matter of hours) compared to loading the dump into MySQL
WikiFilter
WikiFilter (http://wikifilter.sourceforge.net/) is a program which allows you to browse over 100 dump
files without visiting a Wiki site.
WikiTaxi usage
1. Download WikiTaxi and extract to an empty folder. No installation is otherwise required.
2. Download the XML database dump (*.xml.bz2) of your favorite wiki.
3. Run WikiTaxi_Importer.exe to import the database dump into a WikiTaxi database. The
importer takes care to uncompress the dump as it imports, so make sure to save your drive
space and do not uncompress beforehand.
4. When the import is finished, start up WikiTaxi.exe and open the generated database file.
You can start searching, browsing, and reading immediately.
5. After a successful import, the XML dump file is no longer needed and can be deleted to
reclaim disk space.
6. To update an offline Wiki for WikiTaxi, download and import a more recent database dump.
For WikiTaxi reading, only two files are required: WikiTaxi.exe and the .taxi database. Copy them to any
storage device (memory stick or memory card) or burn them to a CD or DVD and take your Wikipedia
with you wherever you go!
BzReader and MzReader (for Windows)
BzReader (https://code.google.com/archive/p/bzreader/) is an offline Wikipedia reader with fast search
capabilities. It renders the Wiki text into HTML and doesn't need to decompress the database. Requires
Microsoft .NET framework 2.0.
EPWING
Offline Wikipedia database in EPWING dictionary format, which is common and an out-dated Japanese
Industrial Standards (JIS) in Japan, can be read including thumbnail images and tables with some
rendering limits, on any systems where a reader is available (Boookends (https://sites.google.com/site/bo
ookends)). There are many free and commercial readers for Windows (including Mobile), Mac OS X,
iOS (iPhone, iPad), Android, Unix-Linux-BSD, DOS, and Java-based browser applications (EPWING
Viewers (http://maximilk.web.fc2.com/viewers.htm)).
Mirror building
WP-MIRROR
Important: WP-mirror hasn't been supported since 2014, and community verification is
needed that it actually works. See talk page.
WP-MIRROR is a free utility for mirroring any desired set of WMF wikis. That is, it builds a wiki farm
that the user can browse locally. WP-MIRROR builds a complete mirror with original size media files.
WP-MIRROR is available for download (http://www.nongnu.org/wp-mirror/).
See also
DBpedia
WikiReader
mw:Help:Export
m:Help:Downloading pages
m:Help:Import
Meta:Data dumps/Other tools, for related tools, e.g. extractors and "dump readers"
Wikipedia:Wikipedia CD Selection
Wikipedia:Size of Wikipedia
meta:Mirroring Wikimedia project XML dumps
meta:Static version tools
Wikimedia offline projects (https://meta.wikimedia.org/wiki/Offline_Projects)
References
1. See Wikipedia:Reusing Wikipedia content § Re-use of text under the GNU Free
Documentation License for more information on compatibility with the GFDL.
2. "Benchmarked: What's the Best File Compression Format?" (https://www.howtogeek.com/2
00698/benchmarked-whats-the-best-file-compression-format/). How To Geek. How-To
Geek, LLC. Retrieved 18 January 2017.
3. "Zip and unzip files" (https://support.microsoft.com/en-us/help/14200/windows-compress-un
compress-zip-files). Microsoft. Microsoft. Retrieved 18 January 2017.
4. "Volume Format Comparison" (https://developer.apple.com/library/archive/documentation/Fil
eManagement/Conceptual/APFS_Guide/VolumeFormatComparison/VolumeFormatCompari
son.html). developer.apple.com. Retrieved 2023-11-19.
5. Large File Support in Linux (http://www.suse.com/~aj/linux_lfs.html)
6. Android 2.2 and before used YAFFS file system; December 14, 2010. (http://www.h-online.c
om/open/news/item/Android-2-3-Gingerbread-to-use-Ext4-file-system-1152775.html)
External links
Wikimedia downloads (https://dumps.wikimedia.org/).
Domas visits logs (http://dammit.lt/wikistats/) (read this! (http://infodisiac.com/blog/2010/07/
wikimedia-page-views-some-good-and-bad-news/)). Also, old data (https://archive.org/detail
s/wikipedia_visitor_stats_200712) in the Internet Archive.
Wikimedia mailing lists archives.
User:Emijrp/Wikipedia Archive. An effort to find all the Wiki[mp]edia available data, and to
encourage people to download it and save it around the globe.
Script to download all Wikipedia 7z dumps (https://github.com/WikiTeam/wikiteam/blob/mast
er/wikipediadownloader.py).