How to clean programmatically old drafts and purge trash with python for Confluence?

 Hello!

In this article, we will learn how to maintenance with python and Confluence REST API and set script into your CI e.g. Atlassian Bamboo. Also, it is very easy to automate it and extend functionality.

Nowadays, so many Confluence instances have enabled function collaborative editing or long time did not maintenance. For example, in our instance, DB backup in text format has been decreased by ~ 16%.

 

Let’s start with trash cleaner functionality.

  1. Algorithm is easy, like get all pages from trash, remove related pages.
  2. And REST API reference located here e.g. https://docs.atlassian.com/ConfluenceServer/rest/6.11.0/
  3. Next thing is easy for implement language, it is python with module requests.

 For more comfortable use raw Rest API I’m using python module with name atlassian-python-api. Hence code is so small and easy.

def clean_pages_from_space(confluence, space_key):
    """
    Remove all pages from trash for related space
    :param confluence:
    :param space_key:
    :return:
    """
    limit = 500
    flag = True
    step = 0
    while flag:
        values = confluence.get_all_pages_from_space_trash(space=space_key, start=0, limit=limit)
        step += 1
        if len(values) == 0:
            flag = False
            print("For space {} trash is empty".format(space_key))
        else:
            for value in values:
                print(value['title'])
                confluence.remove_page_from_trash(value['id'])

Feel free use this full example in script: https://github.com/atlassian-python-api/atlassian-python-api/blob/master/examples/confluence-trash-cleaner.py

 

 

Next step is clean draft pages.

Of course, in this use case we need to have some anchor for determine how old draft we should remove it

Therefore I am using variable

DRAFT_DAYS = 30

def clean_draft_pages_from_space(confluence, space_key, count, date_now):
    """
    Remove draft pages from space using datetime.now
    :param confluence:
    :param space_key:
    :param count:
    :param date_now:
    :return: int counter
    """
    pages = confluence.get_all_draft_pages_from_space(space=space_key, start=0, limit=500)
    for page in pages:
        page_id = page['id']
        draft_page = confluence.get_draft_page_by_id(page_id=page_id)
        last_date_string = draft_page['version']['when']
        last_date = datetime.datetime.strptime(last_date_string.replace(".000", "")[:-6], "%Y-%m-%dT%H:%M:%S")
        if (date_now - last_date) > datetime.timedelta(days=DRAFT_DAYS):
            count += 1
print("Removing page with page id: " + page_id)
confluence.remove_page_as_draft(page_id=page_id) print("Removed page with date {}".format(last_date_string)) return count

https://github.com/atlassian-python-api/atlassian-python-api/blob/master/examples/confluence-draft-page-cleaner.py

 

That’s all. I hope it helps for easy cleanup your Confluence. Next time I will show how to clean page versions, attachement versions. Because of these use case will reduce a lot of disk usage. 

P.S. Let's set into CI system for delegate to other team mates.

image.png

 

Cheers,

Gonchik Tsymzhitov

Comments

Popular posts from this blog

How only 2 parameters of PostgreSQL reduced anomaly of Jira Data Center nodes

Stories about detecting Atlassian Confluence bottlenecks with APM tool [part 1]

Atlassian Community, let's collaborate and provide stats to vendors about our SQL index usage