Thursday, March 7, 2013

Migrating MediaWiki to Confluence

So Today I will be moving a wiki that's sitting on a server running MediaWiki with a MySQL database, into a Confluence wiki, also running with a MySQL server.

Our Pre-Requisites are as follows:

1. Access to the MediaWiki directory and SQL Server (at least read access)
2. A DEV or QA server with MySQL server installed (where we'll also put the UWC Converter)
3. The Confluence instance, preferably a QA one, not the production one.

First, we will need the images directory from the MediaWiki server, which you can verify by opening up the LocalSettings.php file and looking for this line:

$wgUploadPath       = "$wgScriptPath/images";

A few lines above it you will see a line like this:

$wgScriptPath       = "/somewiki";

and that will tell you where the images directory is.  In this case it will be /opt/somewiki/images

Tar up this whole directory:

[root@mediawiki ]# tar cvf all_wiki_images.tar .

now we need the SQL server dump, which we need to first check which database it uses and which username/password are being used.

Again, look for these lines in the same file:

$wgDBserver         = "localhost";      
$wgDBname           = "mediawikidb";                 
$wgDBuser           = "db_user";
$wgDBpassword       = "db_pass";
$wgDBprefix         = "";
$wgDBtype           = "mysql";
$wgDBport           = "5432";         



So as you can see we have all the info we need, we can now do a dump as following:

[root@dev01 ]#  mysqldump --single-transaction -u db_user -pdb_pass  mediawikidbdb > wikidumb-mediawikidb-03-07-2013.sql

This will give you the file which we will now import into the MySQL database on the QA server.
At this point we have finished working on the original server, you may want to put a notice for users not to add any content, or just make the whole directory mode 700 (assuming your webserver user doesnt own the directory!)

We will now need to create a database for this MediaWiki DB on the QA server:

mysql> create database mediawikidb;
mysql> GRANT ALL PRIVILEGES ON mediawikidb.* TO db_user@localhost IDENTIFIED BY 'db_pass';

Then let's import the database dump into the MySQL server:

From the command line (not the MySQL console) run:

[root@dev01 ]# mysql -u db_user -p mediawikidb < wikidumb-mediawikidb-03-07-2013.sql

And then it will prompt you for a password, after which depending on how large this dump is, it will import it in.

I can now check that everything is there:

mysql> SELECT table_schema "database_name",     sum( data_length + index_length ) / 1024 /1024 "Data Base Size in MB",sum( data_free )/ 1024 / 1024 "Free Space in MB" FROM information_schema.TABLES GROUP BY table_schema;
+--------------------+----------------------+------------------+
| database_name      | Data Base Size in MB | Free Space in MB |
+--------------------+----------------------+------------------+
| c4_1               |       25047.82812500 |     590.00000000 |
| information_schema |           0.00781250 |       0.00000000 |
| mysql              |           0.63066769 |       0.00000000 |
| mediawikidb         |         127.62810230 |     420.00000000 |
+--------------------+----------------------+------------------+
4 rows in set (7.53 sec)

mysql>

And as you can see, I highlighted the newly imported database, it's 127Mb, and ready to go.

Now we will set up the UWC Exporter/Converter to export the mediawiki from the database, match it with the images in the images directory we copied and import it all into confluence.

We will now need to go into confluence and create a new space, let's call it mediawiki01 that will also be the space key.



 We will need to create a file in the conf\ directory which  will look like this:

#Tue Feb 12 07:24:53 PST 2013
current.tab.index=0
space=mediawiki01
url=10.18.97.228\:8090
trustpass=
pages=/home/boaz/mediawiki01-wiki/pages
uploadOrphanAttachments=false
pageChooserDir=
attachments=/home/boaz/mediawiki01-wiki/images
trustall=
attachment.size.max=-1
sendToConfluence=true
pattern=
login=admin
truststore=
feedback.option=true
password=password
wikitype=mediawiki

Here is an explanation of the lines relevant in this tutorial: (you can click on the image to see full size)



Then we will need to edit the file called converter.mediawiki.properties, if you want to tweak any settings, in my case I wanted all the original users that created the page (if you dont turn this on, all the pages will be owned by the user importing them into the confluence space) as well as the page histories.

To include user and timestamp data with page histories:

Export your mediawiki data with the udmf property set to true. In your exporter.mediawiki.properties 

1. uncomment in the file conf\exporter.mediawiki.properties file


udmf=true




2. Install the UDMF Plugin on your Confluence instance. (* important - the username/create date will not work unless this plugin is installed)

3. In the converter.mediawiki.properties file (under conf\ ) uncomment:
Mediawiki.0004.userdate.class=com.atlassian.uwc.converters.mediawiki.UserDateConverter


4. Optionally, in your converter.mediawiki.properties, if the users in your mediawiki are not going to be exactly the same users (ie, using the same LDAP or AD), then uncomment and set to false: 

Mediawiki.0004.users-must-exist.property=false




You will also need to edit the conf\exporter.mediawiki.properties file it's pretty self explanatory, here is an example:




One step to complete before you start the import, is to make sure you are allowing

Go into "General Configuration" and look for a checkmark by "Remote API (XML-RPC & SOAP)"
If there isnt one, make sure to add it, otherwise the import wont work.




Then we will run the convert/export like this:

[root@dev01 ]#./run_cmdline.sh conf/confluence.mediawiki-with-history conf/converter.mediawiki.properties


You will see many entries roll by, and finally after the conversion is done it will be uploading like this:

13-03-06 21:28:32,888 INFO  [main] - Uploaded 2200 out of 6613 pages.
2013-03-06 21:28:36,269 INFO  [main] - Uploaded 2210 out of 6613 pages.
2013-03-06 21:28:39,702 INFO  [main] - Uploaded 2220 out of 6613 pages.
2013-03-06 21:28:42,993 INFO  [main] - Uploaded 2230 out of 6613 pages.
2013-03-06 21:28:46,214 INFO  [main] - Uploaded 2240 out of 6613 pages.
2013-03-06 21:28:49,487 INFO  [main] - Uploaded 2250 out of 6613 pages.
2013-03-06 21:28:52,771 INFO  [main] - Uploaded 2260 out of 6613 pages.
2013-03-06 21:28:56,051 INFO  [main] - Uploaded 2270 out of 6613 pages.
2013-03-06 21:28:59,435 INFO  [main] - Uploaded 2280 out of 6613 pages.
2013-03-06 21:29:02,735 INFO  [main] - Uploaded 2290 out of 6613 pages.
2013-03-06 21:29:06,083 INFO  [main] - Uploaded 2300 out of 6613 pages.
2013-03-06 21:29:09,476 INFO  [main] - Uploaded 2310 out of 6613 pages.
2013-03-06 21:29:12,878 INFO  [main] - Uploaded 2320 out of 6613 pages.
2013-03-06 21:29:16,218 INFO  [main] - Uploaded 2330 out of 6613 pages.
2013-03-06 21:29:19,468 INFO  [main] - Uploaded 2340 out of 6613 pages.
2013-03-06 21:29:23,538 INFO  [main] - Uploaded 2350 out of 6613 pages.
2013-03-06 21:29:26,804 INFO  [main] - Uploaded 2360 out of 6613 pages.
2013-03-06 21:29:29,785 INFO  [main] - Uploaded 2370 out of 6613 pages.
2013-03-06 21:29:32,740 INFO  [main] - attachment written JPAValidationSampleRegEx.png
2013-03-06 21:29:32,740 INFO  [main] - Attachment Uploaded: /home/boaz/mediawiki/images/images/9/9f/JPAValidationSampleRegEx.png
2013-03-06 21:29:32,773 INFO  [main] - attachment written JPAValidationSampleTester.png
2013-03-06 21:29:32,773 INFO  [main] - Attachment Uploaded: /home/boaz/mediawiki/images/images/6/63/JPAValidationSampleTester.png
2013-03-06 21:29:32,808 INFO  [main] - attachment written JPAValidationSample.png
2013-03-06 21:29:32,808 INFO  [main] - Attachment Uploaded: /home/boaz/mediawiki/images/images/9/96/JPAValidationSample.png
2013-03-06 21:29:32,843 INFO  [main] - Uploaded 2380 out of 6613 pages.
2013-03-06 21:29:35,873 INFO  [main] - Uploaded 2390 out of 6613 pages.



That's all, now you have the MediaWiki server sitting inside a new space on a Confluence wiki.



19 comments:

  1. HI Thanks for this step by step article.
    For my case of mediawiki to confluence migration, cannot get the images to be converted. In my LocalSettings.php, there is no $wgUploadPath value set.
    When I set the variable to the path of the images folder, the images not no longer displaying on mediawiki anymore. So I commented it out for the images to display on mediawiki.
    Any suggestions?
    ## To enable image uploads, make sure the 'images' directory
    ## is writable, then set this to true:
    $wgEnableUploads = true;
    $wgUseImageResize = true;
    # $wgUploadPath = "/apache/html/wiki/images";
    # $wgUseImageMagick = true;
    # $wgImageMagickConvertCommand = "/usr/bin/convert";

    ReplyDelete
  2. Hi,
    Thanks for the step by step article. For my case of mediawiki to confluence migration, I cannot get any images to migrate. In the LocalSettings.php, I don't see $wgUploadPath variable.
    When I tried to set it, the images will not display anymore on mediawiki. Any suggestions on how to get the images migrated to confluence?
    Mike
    ## To enable image uploads, make sure the 'images' directory
    ## is writable, then set this to true:
    $wgEnableUploads = true;
    $wgUseImageResize = true;
    #$wgUploadPath = "/apache/html/wiki/images";
    # $wgUseImageMagick = true;
    # $wgImageMagickConvertCommand = "/usr/bin/convert";

    ReplyDelete
    Replies
    1. You may have an include file in there somewhere that points to it. But bottom line you have to copy the image directory to the new server. What you can also do to find it, is find a image on the mediawiki, such as moreunique.jpg and then search for it on the server like

      # find / | grep moreunique.jpg

      The mediawiki I was working on also had many includes going to many locations.

      Delete
    2. Thanks for mentioning about the include as grep of LocalSettings.php showed: require_once( "includes/DefaultSettings.php" ); and in that file it has:
      $wgUploadPath = false; /// defaults to "{$wgScriptPath}/images"
      How come UWC did not drill down to it?

      Delete
    3. UWC is pretty primitive, you need to do some prep work and especially if you don't have a plain vanilla installation....it's nowhere near what you'd expect as a migration tool.

      Delete
    4. what kind of prep work? Please advise? my mediawiki is $wgVersion = '1.9.0'; and I don't think there was much customization

      Delete
    5. well look at this page, it goes through it. edit the exporter.mediawiki.properties file mainly, and then run the exporter.

      Delete
    6. I have exporter.mediawiki.properties set according your the step by step. Then I ran ./run_cmdline.sh -e conf/exporter.mediawiki.properties
      But not images exported? Please advise.

      Delete
    7. what do you have in your $wgScriptPath ? email me the whole properties file to minitzer at gmail

      Delete
  3. On Mediawiki it is showing
    find for an image name is displaying the file and the folder?
    /apache/wiki/images/b/b9/Unique.jpg is a file
    /apache/wiki/images/thumb/b/b9/Unique.jpg is a folder ?

    How will this translate on confluence? something like: $confluenceroot/download/attachments/###/Unique.jpg?

    ReplyDelete
  4. No, I gave an example, to search for a more unique image on mediawiki.... dont search for space.gif, as that could show up anywhere, look for a not-so-common name, or upload a fake image and call it whatever you want mike123456.jpg.

    ReplyDelete
  5. Confluence, not sure how you set it up, it either saves the images/attachments in the database or in an image directory, but why are you looking in confluence? you're migrating FROM mediawiki TO confluence no?

    ReplyDelete
  6. Hi Boaz, We are trying to upload the images from mediawiki to confluence, but it didnt seems to be working. We are able to upload the text but attachments are empty. We put debug on and found that for all pages that have attachments the following is displayed:

    Converting Mediawiki Attachments -- starting
    DEBUG [main] - Finding Attachments
    DEBUG [main] - found attachment names: []
    INFO [main] - Converting Mediawiki Attachments -- complete

    It doesnt appear to find the attachments location even though its set in confluenceSettings.properties

    Thanks
    Ricky

    ReplyDelete
  7. are you sure the path to the images is correct?

    ReplyDelete
  8. When we set the uploadOrphanAttachments=true, all the images are uploaded into confluence.but the pages have no attachment(all the attchament are defined as oprhan and we can see it in confluence) to them. The attachment path is set and is pointing to the location defined in $wgUploadPath. We can email the two properties file if that helps.

    Thanks for your response and appreciate your help.

    Thanks,
    Ricky

    ReplyDelete
  9. yes, email them to me at my last name at gmail.com

    ReplyDelete
  10. Hi Boaz,
    I followed your instructions successfully. Thanks
    I do, however, encounter issues with tables.
    they simply not converted as tables.

    Any idea?

    Thanks
    Itamar

    ReplyDelete
    Replies
    1. It's been quite a while, but I was using the latest converter that was sent to me by the developers, I don't know if they made that version public or not.

      Delete