25 Sep 2007

Importing Wikimedia Dump File to MySQL

Importing Wikimedia’s (wikipedia) Dump File to MySQL

  1. Create “wiktionary” schema

  2. Import wikimedia’s DB structure using tables.sql

    $ mysql -u root -p wiktionary < tables.sql

  3. Import the dump file to your new database using mwdumper.jar

    $ java -jar mwdumper.jar –format=sql:1.5 enwiktionary-20070914-pages-meta-current.xml.bz2 | mysql -u root -p wiktionary

Optional

Copy the ‘old_text’ column from the ’text’ table to the ‘page’ table

    mysql> UPDATE 'wiktionary'.'page','wiktionary'.'text' SET 'wiktionary'.'page'.'old_text'='wiktionary'.'text'.'old_text' WHERE 'wiktionary'.'page'.'page_latest' = 'wiktionary'.'text'.'old_id'

Delete non-English rows:

    mysql> DELETE FROM 'wiktionary'.'page' where old_text not LIKE '%==English==%'

Remove results not in namespace 0 (see Wiktionary:Namespace):

    mysql> DELETE FROM 'wiktionary'.'page' WHERE 'wiktionary'.'page'.'namespace' != 0