Thursday, March 7, 2013

Migrating MediaWiki to Confluence

So Today I will be moving a wiki that's sitting on a server running MediaWiki with a MySQL database, into a Confluence wiki, also running with a MySQL server.

Our Pre-Requisites are as follows:

1. Access to the MediaWiki directory and SQL Server (at least read access)
2. A DEV or QA server with MySQL server installed (where we'll also put the UWC Converter)
3. The Confluence instance, preferably a QA one, not the production one.

First, we will need the images directory from the MediaWiki server, which you can verify by opening up the LocalSettings.php file and looking for this line:

$wgUploadPath       = "$wgScriptPath/images";

A few lines above it you will see a line like this:

$wgScriptPath       = "/somewiki";

and that will tell you where the images directory is.  In this case it will be /opt/somewiki/images

Tar up this whole directory:

[root@mediawiki ]# tar cvf all_wiki_images.tar .

now we need the SQL server dump, which we need to first check which database it uses and which username/password are being used.

Again, look for these lines in the same file:

$wgDBserver         = "localhost";      
$wgDBname           = "mediawikidb";                 
$wgDBuser           = "db_user";
$wgDBpassword       = "db_pass";
$wgDBprefix         = "";
$wgDBtype           = "mysql";
$wgDBport           = "5432";         



So as you can see we have all the info we need, we can now do a dump as following:

[root@dev01 ]#  mysqldump --single-transaction -u db_user -pdb_pass  mediawikidbdb > wikidumb-mediawikidb-03-07-2013.sql

This will give you the file which we will now import into the MySQL database on the QA server.
At this point we have finished working on the original server, you may want to put a notice for users not to add any content, or just make the whole directory mode 700 (assuming your webserver user doesnt own the directory!)

We will now need to create a database for this MediaWiki DB on the QA server:

mysql> create database mediawikidb;
mysql> GRANT ALL PRIVILEGES ON mediawikidb.* TO db_user@localhost IDENTIFIED BY 'db_pass';

Then let's import the database dump into the MySQL server:

From the command line (not the MySQL console) run:

[root@dev01 ]# mysql -u db_user -p mediawikidb < wikidumb-mediawikidb-03-07-2013.sql

And then it will prompt you for a password, after which depending on how large this dump is, it will import it in.

I can now check that everything is there:

mysql> SELECT table_schema "database_name",     sum( data_length + index_length ) / 1024 /1024 "Data Base Size in MB",sum( data_free )/ 1024 / 1024 "Free Space in MB" FROM information_schema.TABLES GROUP BY table_schema;
+--------------------+----------------------+------------------+
| database_name      | Data Base Size in MB | Free Space in MB |
+--------------------+----------------------+------------------+
| c4_1               |       25047.82812500 |     590.00000000 |
| information_schema |           0.00781250 |       0.00000000 |
| mysql              |           0.63066769 |       0.00000000 |
| mediawikidb         |         127.62810230 |     420.00000000 |
+--------------------+----------------------+------------------+
4 rows in set (7.53 sec)

mysql>

And as you can see, I highlighted the newly imported database, it's 127Mb, and ready to go.

Now we will set up the UWC Exporter/Converter to export the mediawiki from the database, match it with the images in the images directory we copied and import it all into confluence.

We will now need to go into confluence and create a new space, let's call it mediawiki01 that will also be the space key.



 We will need to create a file in the conf\ directory which  will look like this:

#Tue Feb 12 07:24:53 PST 2013
current.tab.index=0
space=mediawiki01
url=10.18.97.228\:8090
trustpass=
pages=/home/boaz/mediawiki01-wiki/pages
uploadOrphanAttachments=false
pageChooserDir=
attachments=/home/boaz/mediawiki01-wiki/images
trustall=
attachment.size.max=-1
sendToConfluence=true
pattern=
login=admin
truststore=
feedback.option=true
password=password
wikitype=mediawiki

Here is an explanation of the lines relevant in this tutorial: (you can click on the image to see full size)



Then we will need to edit the file called converter.mediawiki.properties, if you want to tweak any settings, in my case I wanted all the original users that created the page (if you dont turn this on, all the pages will be owned by the user importing them into the confluence space) as well as the page histories.

To include user and timestamp data with page histories:

Export your mediawiki data with the udmf property set to true. In your exporter.mediawiki.properties 

1. uncomment in the file conf\exporter.mediawiki.properties file


udmf=true




2. Install the UDMF Plugin on your Confluence instance. (* important - the username/create date will not work unless this plugin is installed)

3. In the converter.mediawiki.properties file (under conf\ ) uncomment:
Mediawiki.0004.userdate.class=com.atlassian.uwc.converters.mediawiki.UserDateConverter


4. Optionally, in your converter.mediawiki.properties, if the users in your mediawiki are not going to be exactly the same users (ie, using the same LDAP or AD), then uncomment and set to false: 

Mediawiki.0004.users-must-exist.property=false




You will also need to edit the conf\exporter.mediawiki.properties file it's pretty self explanatory, here is an example:




One step to complete before you start the import, is to make sure you are allowing

Go into "General Configuration" and look for a checkmark by "Remote API (XML-RPC & SOAP)"
If there isnt one, make sure to add it, otherwise the import wont work.




Then we will run the convert/export like this:

[root@dev01 ]#./run_cmdline.sh conf/confluence.mediawiki-with-history conf/converter.mediawiki.properties


You will see many entries roll by, and finally after the conversion is done it will be uploading like this:

13-03-06 21:28:32,888 INFO  [main] - Uploaded 2200 out of 6613 pages.
2013-03-06 21:28:36,269 INFO  [main] - Uploaded 2210 out of 6613 pages.
2013-03-06 21:28:39,702 INFO  [main] - Uploaded 2220 out of 6613 pages.
2013-03-06 21:28:42,993 INFO  [main] - Uploaded 2230 out of 6613 pages.
2013-03-06 21:28:46,214 INFO  [main] - Uploaded 2240 out of 6613 pages.
2013-03-06 21:28:49,487 INFO  [main] - Uploaded 2250 out of 6613 pages.
2013-03-06 21:28:52,771 INFO  [main] - Uploaded 2260 out of 6613 pages.
2013-03-06 21:28:56,051 INFO  [main] - Uploaded 2270 out of 6613 pages.
2013-03-06 21:28:59,435 INFO  [main] - Uploaded 2280 out of 6613 pages.
2013-03-06 21:29:02,735 INFO  [main] - Uploaded 2290 out of 6613 pages.
2013-03-06 21:29:06,083 INFO  [main] - Uploaded 2300 out of 6613 pages.
2013-03-06 21:29:09,476 INFO  [main] - Uploaded 2310 out of 6613 pages.
2013-03-06 21:29:12,878 INFO  [main] - Uploaded 2320 out of 6613 pages.
2013-03-06 21:29:16,218 INFO  [main] - Uploaded 2330 out of 6613 pages.
2013-03-06 21:29:19,468 INFO  [main] - Uploaded 2340 out of 6613 pages.
2013-03-06 21:29:23,538 INFO  [main] - Uploaded 2350 out of 6613 pages.
2013-03-06 21:29:26,804 INFO  [main] - Uploaded 2360 out of 6613 pages.
2013-03-06 21:29:29,785 INFO  [main] - Uploaded 2370 out of 6613 pages.
2013-03-06 21:29:32,740 INFO  [main] - attachment written JPAValidationSampleRegEx.png
2013-03-06 21:29:32,740 INFO  [main] - Attachment Uploaded: /home/boaz/mediawiki/images/images/9/9f/JPAValidationSampleRegEx.png
2013-03-06 21:29:32,773 INFO  [main] - attachment written JPAValidationSampleTester.png
2013-03-06 21:29:32,773 INFO  [main] - Attachment Uploaded: /home/boaz/mediawiki/images/images/6/63/JPAValidationSampleTester.png
2013-03-06 21:29:32,808 INFO  [main] - attachment written JPAValidationSample.png
2013-03-06 21:29:32,808 INFO  [main] - Attachment Uploaded: /home/boaz/mediawiki/images/images/9/96/JPAValidationSample.png
2013-03-06 21:29:32,843 INFO  [main] - Uploaded 2380 out of 6613 pages.
2013-03-06 21:29:35,873 INFO  [main] - Uploaded 2390 out of 6613 pages.



That's all, now you have the MediaWiki server sitting inside a new space on a Confluence wiki.