The Big Data buzz keeps telling IT managers to be ready to increase the information volume. The Open Data community keeps telling IT managers, especially those at public institutions, to be ready to feed the world with invaluable data.
But there are not a lot of answers about what is data and how to “open” it (access it and make it accessible). There are neat and often expensive tools and approaches to store data, analyze data, and make data flow.
One main issue with the “what” aspect is that the data is generally not entirely available in the enterprise IT system, and even when it is, it’s not that accessible for every kind of work. Some of it is even freely available on the Internet; sometimes even the data of the enterprise itself is available but scattered, unstructured, in web pages.
I recently met with a quite promising startup that aims at contributing to both the “what” and “how”: import•io. Their motto: “Transform information from the web into useable data”.
They build a platform that transforms websites into tables of data, making them as accessible as a spreadsheet or an API. And a very clever aspect of this platform is that it can learn to read web pages by simply surfing with a browser and highlighting the interesting data on the pages. The platform then does the heavy lifting, aggregating data from different pages into always up-to-date datasets, ready to be used, or accessed with the APIs.
One interesting aspect of such a solution is its ability to mix information sources as the project goes, not just initially building a huge definitive platform just for established external data. Sources can grow in number or in volume with nearly no limits (some users already have thousands of sources, some data tables are small, some contain thousands of rows), and of course the project can still assemble in-house data internally as usual. And as the number of sources grow, the economy of the solution still improves as each source is quite light to build, like with most SaaS, instead of being an upfront investment. Moreover, the marginal cost itself of setting up a new source is reduced by the simplicity of use and the accessibility to less experienced profiles.
One aspect that made me reconsider some convictions I had is the speed and simplicity to address something that is usually not that simple. Building an API on your own data is so simple and fast with import•io browser that it could easily be better done with such platform than with in-house development.
I’ve had more than once the chance to design architecture to build a mobile version of a website, and it often ended up in services being shared or developed and new facades being built. I must say that today I would spend some time asking myself about the right mix between building in-house services and simply setting up new datasets.
This new kind of approach reminds me of the feeling I has discovering some hosting solutions that have shaken widely the way we saw the management of a website.
Another aspect that reinforced the convictions I have concerns the company itself that built this platform. It’s a 21st century company, building an impressive and ambitious business from scratch, and fast. Such a platform is built and monitored on top of other platforms and services accessed and integrated directly on the web. They add users exponentially every month, and it makes them stronger, not scared; it makes the platform grow and not stall; it makes the tools more effective and not overwhelmed. Everything is designed to grow like water lilies, the web way. And to do that, using SaaS/IaaS solutions assembled over the Internet is a promising path.