![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
dealing with exported G+ HTML posts
At least one of my fellow soon-to-be-exiles from G+ has done things with scripts and JSON exports, but I don't know how to run that, so I went with an HTML export of my posts via Takeout. I told it to only give me posts, but it still gives you photos from posts in a separate directory, be aware.
Then you'll have a zip file containing one HTML file for each post, which starts with the date in yyyymmdd format, and which includes the comments.
If you're keeping these locally, you can just unzip it and use your local computer's tools to search for things.
I would, in any event, recommend doing a mass search & replace to remove the following:
body {font: 11pt Roboto, Arial, sans-serif; max-width: 640px; margin: 24px;}
because that max-width makes things weirdly narrow. (I use a very old version of a paid text editor that allows me to do search & replace of files in a directory without opening them all; I think their trial version will also do it, if you don't have a tool already.)
Merging HTML Files
If you'd like to have them in fewer files for whatever reason, on Windows you can use the command line to merge the files. Get to the command line/command prompt/cmd.exe however you do on your OS and change directory to wherever you put the unzipped Posts directory.
If you run
copy *.html merged.html
it will copy every existing html file, in file name order, into a single new merged.html file. This is likely to be Too Big for most purposes, so you can use wildcards. For instance,
copy 201801*.html merged-201801.html
will copy all posts from January 2018 into a single file. (For some reason, on my system when I did this, I got a single non-ASCII character at the end of every merged file, just so you know.)
I made a batch file to do all months from March 2012 (when I started using G+) to now, which I will put at the end of this post just to save anyone else the generating it in a spreadsheet.
Importing in WordPress
If you want to be able to search stuff while away from your local machine, the most automated solution I found was importing into WordPress using the HTML Import 2 plugin. (I thought at first about using DW because that's searchable with a paid account, but the max-post size means that I would have had to break the merged files up smaller than by month and that's a lot to paste in. Also I'm thinking of exporting this DW to a self-hosted WP blog anyway and cross-posting, and that way everything would be in one place.)
You might want to split your posts directory into smaller chunks first; I subdivided mine by year. (Edit: not the merged files, the original ones, this is a post-by-post import.) Then upload to a directory on your server.
Go into Settings / HTML Import in Wordpress.
- Under Files, enter the directory.
- Content: choose HTML tag and enter "body".
- If you check "Import linked images" and upload the Photos directory from your takeout, it will do a pretty good job of putting the photos you posted into their posts. However, it will also attempt to import external images, the ones that come with link previews. So I decided not to bother.
- Do not check "Clean up bad ( Word, Frontpage ) HTML," it will eat formatting necessary for your comments to be readable.
- Title & Metadata: Select title by HTML tag, specifically "title". Set status to "private" (knowing how we used G+). Set timestamps to "custom field".
- Custom Fields: Select date by HTML tag, tag = span, Attribute = itemprop, Value = dateCreated.
Save settings, and then import files. This is quite efficient; my imports only hung twice and I just restarted and they worked fine.
And now you have your G+ posts in a blog-like form that has a built-in search engine, that (if you've set it to private) only you can see when you're logged in.
copy 201203*.html merged-201203.html copy 201204*.html merged-201204.html copy 201205*.html merged-201205.html copy 201206*.html merged-201206.html copy 201207*.html merged-201207.html copy 201208*.html merged-201208.html copy 201209*.html merged-201209.html copy 201210*.html merged-201210.html copy 201211*.html merged-201211.html copy 201212*.html merged-201212.html copy 201301*.html merged-201301.html copy 201302*.html merged-201302.html copy 201303*.html merged-201303.html copy 201304*.html merged-201304.html copy 201305*.html merged-201305.html copy 201306*.html merged-201306.html copy 201307*.html merged-201307.html copy 201308*.html merged-201308.html copy 201309*.html merged-201309.html copy 201310*.html merged-201310.html copy 201311*.html merged-201311.html copy 201312*.html merged-201312.html copy 201401*.html merged-201401.html copy 201402*.html merged-201402.html copy 201403*.html merged-201403.html copy 201404*.html merged-201404.html copy 201405*.html merged-201405.html copy 201406*.html merged-201406.html copy 201407*.html merged-201407.html copy 201408*.html merged-201408.html copy 201409*.html merged-201409.html copy 201410*.html merged-201410.html copy 201411*.html merged-201411.html copy 201412*.html merged-201412.html copy 201501*.html merged-201501.html copy 201502*.html merged-201502.html copy 201503*.html merged-201503.html copy 201504*.html merged-201504.html copy 201505*.html merged-201505.html copy 201506*.html merged-201506.html copy 201507*.html merged-201507.html copy 201508*.html merged-201508.html copy 201509*.html merged-201509.html copy 201510*.html merged-201510.html copy 201511*.html merged-201511.html copy 201512*.html merged-201512.html copy 201601*.html merged-201601.html copy 201602*.html merged-201602.html copy 201603*.html merged-201603.html copy 201604*.html merged-201604.html copy 201605*.html merged-201605.html copy 201606*.html merged-201606.html copy 201607*.html merged-201607.html copy 201608*.html merged-201608.html copy 201609*.html merged-201609.html copy 201610*.html merged-201610.html copy 201611*.html merged-201611.html copy 201612*.html merged-201612.html copy 201701*.html merged-201701.html copy 201702*.html merged-201702.html copy 201703*.html merged-201703.html copy 201704*.html merged-201704.html copy 201705*.html merged-201705.html copy 201706*.html merged-201706.html copy 201707*.html merged-201707.html copy 201708*.html merged-201708.html copy 201709*.html merged-201709.html copy 201710*.html merged-201710.html copy 201711*.html merged-201711.html copy 201712*.html merged-201712.html copy 201801*.html merged-201801.html copy 201802*.html merged-201802.html copy 201803*.html merged-201803.html copy 201804*.html merged-201804.html copy 201805*.html merged-201805.html copy 201806*.html merged-201806.html copy 201807*.html merged-201807.html copy 201808*.html merged-201808.html copy 201809*.html merged-201809.html copy 201810*.html merged-201810.html copy 201811*.html merged-201811.html copy 201812*.html merged-201812.html copy 201901*.html merged-201901.html copy 201902*.html merged-201902.html copy 201903*.html merged-201903.html
Copy that into a text file, save it as a file name ending in .bat in the same directory as your posts, and then just double-click it.
(I remembered the poll code! I wonder if I can make a bookmarklet that will ask me if I did when I hit the "Post" button. No, nevermind, that's enough computer-y stuff for one weekend.)
+1 (thumbs-up, I see you, etc.)?