unstructured data

My Initial Attempts of Managing Unstructured Data – Part 1

Chris Farmer Data Insight, Managed IT Support Services

I thought I’d share a little bit of my history and the battle that unstructured data causes within IT Infrastructures.

Let me just clarify what I mean by unstructured data.

  • Unstructured data is files of any kind, stored on a file system and shared to users.
  • Semi-Structured data could be your email system
  • Structured could be databases

During my time working for large multinationals, unstructured data was growing considerably faster than anything else. This isn’t anything different from any other business out there, with System Admins and IT Managers dealing with the uncontrolled and unstructured data being dumped by their users.



So, I’m going to go over a few things that I did in my corporate life to try and resolve this situation or not.

To try and allow you to make sense of some of this below

I worked for a multinational business, this business contains several sectors like, healthcare, energy and others. Each sector also spans the globe and in total, the entire company has around 400k employees.

I left in 2013 to make my own mark in the world and because I wanted to move away from the stifling corporate politics and work for a new, agile company that ideologies’ also match my own.

So, without further ado.

Document Management System (DMS) 2001

So, as part of the world wide initiative, started 12 months before I joined, a pilot DMS had been introduced and about 6 months after I had started, was rolled out into my sector.

The approach was great, the Managing Directors (CEO) were the sponsors for their Operating Company in that sector and the IT Department handled only the technical installation and future maintenance. It had been decided that the users should be empowered, so they had;

• Administrators, who had complete control over their departments’ areas and needs.

• Super Users, who could create folders and sub-folders and assign permissions to those folders

• Standard users who could add/edit and delete documents

This particular Operating Company that I worked for, had thought about structure and embarked on a ‘3 clicks to any data’ strategy within the DMS. A clever idea and one that people liked, but in the end would never work.

The MD/CEO believed in the DMS and decided that training was key, so everyone did mandatory training.

The hype was short lived, the users defined as Administrators and Super Users soon waned under the constant need of their users.

The process of editing a document or adding a new document involved multiple steps, nothing like DMS platforms are today.

Soon, IT found itself back in control and the users started calling the platform ‘dead link’.

It is still in operation today, but after 13 years, people were still saying, ‘Its somewhere in dead link!’ It was a good and valid attempt, but the culture just wasn’t and still isn’t in that business. In the end, the users lost confidence, the battle was lost and for users, who have pressure to perform and hit target, it’s so much easier just to save files on a shared folder.

Where it will stay, for evermore.

Working Group (2002)

After the DMS issue, the Director of Quality was then asked to look at a solution to help. A working group was formed, from R&D, manufacturing and others departments.

The group created silos of information that loosely transformed into Unstructured, Semi and Structured data and also wanted to know; How relevant the data was. Was it created last week? When was it last modified or read? Is it needed again?

Weeks went by as discussion after discussion occurred with no conclusion be reached or actions being taken.

In hindsight, the fundamental problem would end up being resource based and not knowing what was actually the biggest problem.

As manufacturing was the focus of the business, the assumption was made, with no hard evidence, that manufacturing must be the biggest contributor. And manufacturing needed to move out of the dark ages and do it right. But ‘right’ is expensive, so the decision then was, ‘let’s just not do it.’

Nearly 10 years on – manufacturing is now right, after lots of investment on software developers. As it turned out, just before I left, that data was now only a few percent of total growth per year, with the major growth now in emails and documents from the offices.

This is now mainly down to the that need to comply with regulatory requirements and users are paranoid about deleting the wrong thing.

Manual Growth Monitoring (2003 – 2005)

So, during the working group we monitored growth on our BlueArc NAS, this was done the old fashioned way, excel and right clicking on the folders being monitored. So this was OK, I spent most of my Mondays playing with this, it allowed me to surf a little, plus there was only a couple of TBs to monitor.

As it was such a bore, I would eventually only do this periodically and more often when we would require storage hardware refreshes.  As luck would happen, my BlueArc wouldn’t support Microsoft Active Directory 2003, so I got a new set of controllers and more storage, I dodged a bullet on having to request more capacity this time around.

But as the data grew, another solution was required. I found one eventually, but it wasn’t perfect…

Read Part Two

Find me on Twitter @novco007

See our other Blogs

Want a go?

[maxbutton id=”13″]