Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as...

20
Big Data ematic Paper

Transcript of Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as...

Page 1: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

Big Data

Thematic Paper

Page 2: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 2 -

Thematic papers

The goal of the thematic papers is to present Flemish scientific research internationally. They focus on fundamental and applied research.

The thematic papers are published by Research in Flanders, a project run by Flanders Knowledge Area.

The project Research in Flanders is funded by the Flemish Government, Department of Foreign Affairs.

Flanders Knowledge Area supports, through different projects, the internationalization of higher education in Flanders, Belgium.

www.researchinflanders.be www.flandersknowledgearea.be

Tom Ameloot, Postdoctoral Fellow at Databases and Theoretical Computer Science Research Group (DTCSRG), Hasselt university (photo: Tom Ameloot’s Research Team)

Jabran Bhatti, Project Leader at Televic

Johannes Cottyn, Assistant Professor in Automation (Ghent University) and Project Leader at XiaK (Centre of eXcellence in Industrial Automation Kortrijk) (Ghent University and Howest, University College West Flanders)

Thomas Crombez, Researcher and Lecturer in Theatre History, the Royal Academy of Fine Arts, Artesis Plantijn University College Antwerp

Jan Dewilde, Librarian at the Royal Conservatoire of Antwerp, Artesis Plantijn University College Antwerp

Rudy Gevaert, System Developer, Ghent University

Yves Moreau, Professor in Bioinformatics at the University of Leuven and researcher at the iMinds Future Health Department, University of Leuven

Bart Vanhaelewyn, Media Consumption Analyst at iMinds MICT (Ghent University) en iMinds Media

Katrijn Vannerum, Project Manager at Big N2N (Ghent university) and Systems Biologist at the Flemish Life Sciences Research Institute (VIB)

Wilfried Verachtert, Manager of the Exascience Life Lab and High Performance Computing Project Manager at Imec, University of Leuven

For this thematic paper we talked to:

Page 3: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 3 -

Back in September 2014 the internet reached a historic milestone: in that month the worldwide web crossed the proverbial frontier of one billion websites. The www only needed 23 years to grow from 1 single and separate web page, back in August 1991, to a whopping, 1,000,000,000 websites in 2014.

Although the internet is still growing steadily as we speak, it is implausible that trend will still continue for years to come, as individual websites are increasingly losing ground to larger web portals like Google, Facebook and Twitter - who do everything in their power to keep us on their pages. But the sheer quantity of data buzzing around the electronic highway keeps on exploding - to mythical proportions at that. Experts now assume that every 10 seconds round about 5 billion gigabytes of data is added to the web...

All these huge figures mean that the mountain of useful digital data - called big data, the buzz word of recent years - is growing every second. This is primarily down to three factors: first up there is old analogue data, often kept on tangible and fragile data carriers like paper, CDs, floppy disks and 35mm film. By now a large part of it has been digitised so it is easier to preserve and access. Secondly, science is at such a high level in these modern times that it can truly measure anything there is to measure - think for example of reading someone’s DNA or detecting elementary particles in a particle accelerator. Thirdly and lastly the fact that our lives play out more and more in cyberspace - Google is now able to predict when we need new shoes, for instance - has turned the information about our digital behaviour all into big data as well.

The enormous amount of data that’s available, then, offers scores of possibilities for science we don’t even know about yet, while at the same time offering a glimpse of ultra-useful, cost- and even life-saving applications - health apps that monitor your physical condition or your eating and drinking habits, for example.

On top of all this, big data is much more of an alfa than an omega when it comes to scientific research, because by putting databases next to each other with the help of powerful computers or by joining them together, scientists can identify correlations that may ultimately lead to new questions. Which can then be taken up by (other) scientists.

Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new applications. And besides this, every association or institute with an archive worth the name is involved in one digitisation project or other. It is these three aspects of data management we will be addressing in this dossier.

Big DataLife in a digitised world

Page 4: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 4 -

Digitisation

© ConsErfgoed

Page 5: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 5 -

An old-fashioned archive may be maintained, ventilated and guarded ever so well, it is never completely impossible that an unexpected disaster destroys a large portion of the analogue information inside, stored on paper, photographs, film or in objects (paintings, for instance). So it is an absolute must to digitise all that analogue stuff. This belief has moved every association and institute with an archive worth the name to set up digitisation projects all over the place. Besides, digitised archives have the undeniable advantage that they are much easier to consult and search - even from across the world!

Oral tradition

But how do we digitise information we didn’t even have on physical data carriers in the first place, even in the analogue era? The information people tell each other? Tricky though this may seem, it is exactly the challenge of the Belgium is Happening project, set up by the Royal Academy of Fine Arts in Antwerp, which is part of the city’s Artesis Plantijn University College. The aim of the project is to put together an online archive with as much data as possible about performance culture in Belgium - i.e. performing arts, performance art and happenings.

For Belgium is Happening students of AP University College Antwerp have set out documenting and

visualising as many events in post-war performance culture as possible. ‘We’re not only trying to put all of these events into a neat chronological order,’ explains Thomas Crombez, researcher and lecturer in Theatre History at the Royal Academy of Fine Arts (Artesis Plantijn University College Antwerp). ‘By transcribing the interviews we have conducted, we want to contribute to the written history of a unique and major part of post-war art, literature and theatre history.’

The website where the archive can be freely consulted does not only serve as an environment for users to obtain documentation, but also as a collaboration platform. ‘Students and lecturers can add information and correct each other,’ explains Crombez further. ‘The amount of digitised material (there are about 2,300 events online at the moment) also offers new possibilities in terms of consultation, searchability and visualisation. We invite our students to present their material on timelines and network diagrams.’

Flanders’ largest music library

Another department of Artesis Plantijn University College Antwerp, the Royal Conservatoire of Antwerp, is also working on its own digitisation project, ConsErfgoed, which will digitise the library of the conservatoire and then put it online. ‘Our library has about 600,000 volumes, mainly musical scores,’ says the Conservatoire’s librarian Jan Dewilde. ‘This means our library actually has the largest collection of music in Flanders. The oldest work we have is a Gregorian manuscript dating from as far back as the 13th century, while our most recent pieces were composed yesterday, so to speak.’

The library of the Conservatoire is in the first place a private one for the university college, but thanks to its remarkably rich collection of historic music, it became the very first recognised heritage library in Flanders. ‘It also serves as a public music library where people from outside can come and consult our collection and borrow items,’ says Dewilde.

More information

Belgium is Happening, Artesis Plantijn University College Antwerp: www.belgiumishappening.be

Library digitisation project, Royal Conservatoire Antwerp: www.libraryconservatoryantwerp.be/erfgoed/

Page 6: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 6 -

Big Data

© Big N2N

Page 7: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 7 -

The development of next-generation sequencing technology, which allows organisms’ genetic and other molecular information to be read in no time, has provided science with great possibilities, but also poses major challenges. The ever shorter time needed to, e.g., analyse large pieces of DNA or groups of proteins, and the decreasing cost of these analyses have produced a real tsunami of molecular data. It’s all very interesting no doubt - as it tells us something about the traits we were born with and how much risk we have of developing certain illnesses - but the search for meaningful information in the gigantic heap data sequencers spew out is turning more and more into looking for a needle in a haystack.

Single-cell genomics

Differently put: the fact that molecular biology has also entered the big data era does not necessarily mean that from now on everything will take care of itself. Researchers of the SymBioSys centre at the University of Leuven have taken up the challenge. They are using molecular big data for so-called systems biology, trying to specifically understand how mutations present from birth can develop into genetic conditions or how mutations occurring during our lifetime cause cancer.

‘Systems biology tackles the modern molecular way of doing medicine in an integrative way by avoiding the focus on the effect of a single mutation on a single protein on a single symptom of a single disease,’ says Yves Moreau, professor in Bioinformatics at the University of Leuven and researcher at the iMinds Future Health Department. ‘Instead we collect comprehensive data across the entire genome to create a systemic view of diseases. This approach has been made possible by new technology that allows us, for example, to sequence a significant fraction of or even entire genome, or to measure the activity of all genes in a given pathological state, and so on. These new technologies have transformed biology from a data-poor and labour-intensive science into a highly automated big-data science.’

The Human Genome Project, completed in 2000, sequenced a human genome at a cost of about € 3 billion. By 2007, the genome of James Watson, co-discoverer of the double helix structure of DNA, was sequenced at a cost of less than 1 million. Today, large centres can sequence individual genomes for about € 1,000. While the data of an individual genome can fit onto one DVD, large genome projects nowadays aim

at sequencing 10,000 to 100,000 genomes. ‘Analysing this data is computationally intensive,’ says Moreau. ‘It leads to severe bottlenecks in terms of computing power, data storage and data transfer bandwidth. Sequencing technology has truly led to a data explosion.’

Moreau and his colleagues at SymBioSys help tackle these issues by developing new algorithms to manage all this data. They have been pioneering single-cell genomics. Moreau elaborates, ‘This is a new frontier in molecular biology where instead of analysing the genome of a patient or malignancy, we focus on genomic differences between individual cells. For example, in cancer, initial treatment will often wipe out the vast majority of malignant cells, but some drug-resistant cancer cells might survive the treatment, start growing again and lead to relapse. How are these cells different at the level of their genome and how does this explain which cells are sensitive to treatment and which ones are resistant? How can we improve treatment to make sure all tumour cells are wiped out? Analysis of the genome sequence of individual cells can help pinpoint relevant mutations, but this means that in the future cancer therapy will not only require sequencing of the

Page 8: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 8 -

genome of the tumour, but rather of hundreds of individual tumour cells. This in turn will lead to more data explosion with unprecedented computational requirements.’

Computational power

The computing methods Moreau and his colleagues are using to try and get to the bottom of the big data mountain are based on the processing power of computers. Research centre Imec in Leuven is working hard on ways to speed up the interpretation of sequenced data and even automate it wherever possible. Imec houses the Exascience Life Lab, which uses the services of a few supercomputers alongside those of human researchers. At the lab they are trying to make DNA analysis a lot more efficient by changing the way computers process their data. ‘Now, whenever computers analyse DNA samples, they always need to compare them to a piece of reference DNA,’ says Wilfried Verachtert, manager of the Exascience Life Lab and High Performance Computing project manager at Imec. ‘This amounts to an immense jigsaw puzzle that only the brute force of a computer can solve by matching individual pieces of DNA alternately and resorting the letters of the DNA code. The process goes on until all the DNA in the sample has been sequenced.’

Bottleneck

All this is done by several software programs that one by one start working with the results from the sequencers. ‘Until very recently they were still fast enough to keep up with the speed of the sequencer, but nowadays the technology in this area is changing so fast - quicker than chip technology even - that the computing part has become the proverbial bottleneck of the entire analysing process,’ says Verachtert.

A major obstacle is the fact that the programs have to run in a particular order - i.e. they need each other’s output to work with - so they can’t be run on today’s powerful multicore processors. The software also often has to redo the same thing (matching and sorting). Hence why Verachtert and his colleagues at Imec have been using supercomputers, built by chip giant Intel, since 2010 to speed up the process. Since 2013 they have also been working together with Janssen Pharmaceutica.

Software carrousel

The Exascience Life Lab’s collaboration with one of the world’s largest pharmaceutical companies is not a coincidence. Verachtert’s research group and the company’s R&D team have similar goals: shortening clinical trials by making DNA analysis thoroughly more efficient. This public-private

co-operation gave birth to a new tool very recently too: eIPrep, which combines all the steps needed to get sequenced data ready for the last part of DNA analysis, the search for similarities and differences with other genomes. Verachtert explains, ‘With eIPrep we only need to run the original data through the whole software carrousel once and the results are written in only one file at the end of it. That’s a whole lot better than how things used to be with a series of temporary files that had to be created and read one by one.’

The first tests with eIPrep have proven impressive: the total analysing time for one DNA sample could be shortened by more than 50% (meaning a reduction from 13 hours per genome to only 5). ‘You should realise this: if as part of a clinical trial, for example in the search for a new medicine, there are 300 DNA samples to be processed, this can take days, if not weeks. The time and cost savings made with eIPrep will allow companies to do more trials. Which will ultimately mean better and safer medication,’ says Verachtert.

Personalised medicine

Ghent University’s Bioinformatics Institute Ghent From Nucleotides to Networks (Big N2N for short) is also working on better analysing methods for molecular big data. For example by assembling pieces of

Page 9: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 9 -

genome with each other faster and better in a targeted way, and even automating the whole process. But Big N2N is also aiming at a higher level, in the area of proteomics, which focuses not on DNA and genes, but on proteins. ‘The data produced by mass spectrometry on proteins is not smaller or less complex than genetic big data,’ claims Katrijn Vannerum, project manager at Big N2N and systems biologist at the Flemish Life Sciences Research Institute (VIB).

She is convinced that the fast processing speed of big data will ultimately make precision and personalised medicine a reality - big data straight from your bed to your bedside table, i.e. patient-tailored medication. ‘Our researchers are tracking down mutations sensitive to therapy, for example, and identifying biomarkers for cancer in the skin and brain (these are substances in our body providing a detailed picture of how a particular illness is fairing, ed.).’

Research into so-called IncRNAs - long, non-coding RNA molecules that play a part in the formation of tumours - is very innovative in this respect. ‘By analysing big data sets, we can get a better insight into the role these IncRNAs play,’ continues Vannerum. ‘And last but not least, we’re also working on big analyses of epigenetic genome modifications, reversible chemical changes in

the genome that can affect gene expression. This technique can be applied in the distant future to cure AIDS and tackle cancer through gene therapy.’

Infectious diseases and prenatal testing

Big data is also being employed in the battle against less malicious or terrifying conditions. Vannerum explains, ‘We’re developing a method that uses big data sets to make more precise and quicker diagnoses and checks possible for diseases caused by bacteria. And we’re working on non-invasive prenatal diagnostic tests based on thorough data analysis among pregnant women.’

For those readers who are still in doubt: big data is revolutionising plant biology too. Something scientists in the field are particularly mad about, for instance, is comparing the genome of a crop that seems to be vulnerable to pests to one of a resistant kind. ‘By doing that we can find specific resistant genes that breeders and biotechnologists can then work with to grow stronger crops,’ states Vannerum.

More information

SymBioSys project, University of Leuven: www.kuleuven.be/symbiosys

Exascience Life Lab, Imec: www2.imec.be/be_en/research/life-sciences/exascience-life-lab.html

Big N2N, Ghent University: www.bign2n.ugent.be

The mark big data is leaving on science has grown so large that Ghent University has now changed its curriculum. The Institute of Permanent Education of the Faculty of Engineering & Architecture and Bioscience Engineering Technologies set up a separate, albeit one-off, big data study programme in 2015. www.ivpv.ugent.be/opleidingen/aanbod/bigdata2015/index2.htm

Page 10: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 10 -

Back in 2009, Lieven De Marez, professor in Innovation Research at Ghent University and director of the MICT iMinds department, set up the digiMeter project to map the average Flemish person’s annual use of (both traditional and online) media and ICT.

Examining the media and data consumption of the average Flemish Joe looks more straightforward than it is, though. You’d think sending out invitations for an online survey would be enough, but you obviously only reach that part of the population which uses the internet already. So in order to keep the user panel as representative and active as possible, De Marez and his colleagues hit the road every year to convince people at markets and festivals, in libraries and railway stations to take part in their digiMeter project. Their work pays off: at least 2,000 people take part in the study every year.

Online media masters

The study puts Flemish people into 5 categories with original and enlightening names depending on how they use digital media and ICT. The first category, the online media masters, are young people who have grown up with digital media and have hardly known anything else. According to the latest results (from 2014), their laptop is their most prized device, often in conjunction with their smartphone. ‘These online media masters are often young people who are still

studying or have only gone to work very recently,’ says Bart Vanhaelewyn, media consumption analyst at Ghent University and MICT. ‘They don’t tend to have the financial prowess (yet) to buy absolutely everything they want, so they’re unlikely to have bought a tablet. In terms of online content, they tend to choose free or very cheap alternatives.’

Media omnivores

Then there is the second category, the so-called media omnivores. These are mainly people in their 30s who are very well acquainted with digital media and ICT, but have not embedded them into the fabric of their lives as well as the online media masters have done. ‘This group has more financial means to buy devices and content (like subscriptions for Netflix and Spotify) and they also tend to do so,’ Vanhaelewyn continues. ‘This category of people tends to consume a good mix of traditional media topped with a flexible layer of digital content. Almost every media omnivore has a laptop and smartphone.’

Digital explorers

Now for the third group: the digital explorers. These are people in their 40s and 50s who have only discovered and learnt to appreciate the great advantages of digital media since they have been using their tablet. ‘These are people who found a computer too complicated and unwieldy to handle, but a tablet is more intuitive,’ explains Vanhaelewyn.

Media and data consumption in Flanders

Page 11: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 11 -

‘Although they’re oriented mainly towards traditional media overall, meaning they tend to read a physical newspaper and watch the TV news, they are slowly starting to find their way in the digital world.’

Functional media users

Then there is the fourth group of functional media users. ‘They’re acquainted with the internet and computers, but they only tend to use them when absolutely necessary. Most of all, they like to stay in their comfort zone of classic media, so you won’t often catch them watching a clip on YouTube or downloading music, but they won’t recoil from sending an email or using a word processor to write a letter.’

Analogue media fans

Lastly there are the analogue media fans. ‘This group is rather suspicious of digital media,’ says Vanhaelewyn. It’s not surprising, then, that this group has the lowest rate of internet adoption of all five categories (only 48% indicates they have an internet connection at home, while other categories consistently reach rates above 95%). They don’t need any more for their media consumption than their paper newspaper, magazine, traditional radio and TV (often still one with a traditional glass cathode ray tube).

A few other striking results from the 2014 digiMeter:

- 7 in 10 Flemish people spend time media multitasking (i.e. surfing the internet on one device while looking at another screen at the same time).

- People in their 40s and 50s use tablets the most.

- In 2014, the smartphone adoption rate was higher than that of normal mobile phones for the first time.

- This does not mean that the classic text message has disappeared, though. No, young people who often use WhatsApp and Facebook Messenger also tend to send the most text messages.

- 68% of people have a Facebook account.

- News consumption on mobile devices is increasing dramatically.

- YouTube is the most used online music channel.

- 3 in 10 people store their data in the cloud.

More information

digiMeter, iMinds: www.iminds.be/en/gain-insights/digimeter iMinds Research Group for Media and ICT (MICT), Ghent University: www.ugent.be/ps/communicatiewetenschappen/en/research/mict/

Page 12: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 12 -

Page 13: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 13 -

© MICT

Page 14: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 14 -

The virtues and dangers of big data

© Julie Putseys

Page 15: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 15 -

Although there is no lack of online travel planners these days, their efficiency - certainly those promising real-time travel information - is often still below par. Passengers have to sift through several sources and use multiple apps to find the information they need. The TraPIST project, supported by iMinds and the Agency for Innovation by Science and Technology (IWT), wants to do something about this. Not by drinking lots of beer, as its tasty name would suggest, but by creating Train Passenger Interfaces for Smart Travel, TraPIST for short. The aim of the project is to create an interface presenting all the information train passengers need in one easy overview. ‘First of all we need to know what elements are important for an optimal travel experience,’ says Jabran Bhatti, project leader at Izegem electronics company Televic, a business participating in the project. ‘To do this we have enlisted the help of some test subjects with different travel profiles to lend a hand and certainly play a part in the design process where we will focus specifically on their experience. We will be developing a software platform based on the results of the test panel that will collect data from multiple sources. Then we’ll let smart analyses, classification mechanisms and filters loose on the data. Everything should of course happen in a dynamic context,

because a train can suddenly be delayed at the last moment... but of course that’s old news for train passengers in Belgium.’

Enriched information

Bhatti calls the travel information people see on their smartphones, and which could be displayed on the screens in the station or even on board trains, enriched information. ‘The way the displaying of travel advice is designed has been influenced by an innovative process we call co-creation,’ he continues. ‘So we didn’t only use the testing panel as a source of information, but also as fellow developers, if you will. The idea was that TraPIST should supply exactly the information train passengers need at the very moment they need it, in the way they want it.’

Efficient real-time travel information is only one digital application that can make our lives easier. Working in the cloud is another. The cloud, a virtual infrastructure with shared soft- and hardware we don’t need to maintain ourselves - though this means we cannot manage it ourselves either! - seems to have everything

to replace the way we used to work on computers: saving files on our own hard disk. And there is more in that cloud than we realise: think of all our profiles on social media networks, for example, or all the searches we put into Google and Yahoo! every day.

Working in the cloud

But of course there needs to be someone to maintain all those clouds and update the software on the supercomputers that keep them up and running. (Re-)programming that software is a gigantic task for the world’s best ICT specialists, many of whom work in Silicon Valley - the Valhalla of internet technology. Well, now they receive some help from Flanders, from Hasselt University to be exact. During research for his PhD, Tom Ameloot, postdoctoral fellow at Hasselt University’s Databases and Theoretical Computer Science Research Group (DTCSRG), developed a number of tools that have made programming cloud software much faster, with fewer errors and more efficient. Ameloot wrote a logical foundation for

Page 16: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 16 -

Bloom, the language lots of cloud software runs on. It makes using cloud solutions all over the world simpler and faster. ‘I mainly constructed a theory in my research that tried to provide insight into which kind of programming techniques could be efficient in the cloud, and which not,’ he clarifies.

Ameloot’s group is doing a lot of research, both theoretical and applied, into working in the cloud and big quantities of data in general. ‘My colleagues Bas Ketsman and Frank Neven, for example, are occupied with cloud computing theory,’ he continues, ‘and one of our PhD students is working on a way to make executing calculations based on large quantities of data more efficient.’

And where is the limit of working in the cloud? Is the cyber sky the limit in this case? About this, Ameloot says, ‘It all depends on the cloud application in question. If you want to co-ordinate individual servers overall, the network shouldn’t be too big, because calculations will slow down too much. All the computers in the network have to wait for each other or mostly for the majority anyway. If, on the other

hand, they can all sufficiently run by themselves and serve users in a parallel way, then it’s easier to add new ones. You’ll get a bigger and looser cloud that way, admittedly with no or very little overall co-ordination. My guess is that many clouds taking care of data storage, like Google Drive, are of this loose type.’

Online burglary

Making the switch from the old, analogue era to the new digital one of course looks all rosy, but it entails some dangers as well. Ones we do not consider at all or only very little. Where burglars used to have to conquer physical barriers or sheer human force in order to steal sensitive information, they can now do it online - maybe even from the other side of the globe! Stealing data is not the only thing hackers do either, as they can also mess with company processes from a remote location or even shut them down completely. It is downright terrifying to think of them hacking into the internal network of a nuclear power plant that was set up to make remote operation of the processes possible in the first place. This makes industrial security an important area in data science research.

Hackers, viruses and worms

‘Joining the forces of IT with automation networks is causing a lot of movement in the industrial sector,’ says Johannes Cottyn, assistant professor in Automation at Ghent University and project leader at XiaK. ‘The implementation of Ethernet-based networks in production halls has made it possible to get easy access anywhere and at any time to the whole production process. If problems occur, not only the company’s own staff, but also suppliers of machinery and automation systems can log on from a remote location to diagnose the problem and even make changes to the installations.’

That’s all wonderful, but the implementation of such networks also opens the door to virtual threats from hackers and infections with computer viruses and worms. ‘Recent studies have shown industrial control systems are increasingly targeted by such attacks,’ continues Cottyn, ‘so industrial security should be a major area of attention in the design of any automation network.’

Page 17: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 17 -

Test centre for industrial security

Cottyn also took the initiative to set up XiaK of which he is now project leader. XiaK stands for Centre of eXcellence in Industrial Automation Kortrijk and works under the auspices of Ghent University Campus Kortrijk and Howest, University College West Flanders. ‘We started a TETRA project in 2015 to support companies with industrial security,’ he explains. ‘Let’s say, we set up an industrial security test centre.’

The primary aim of this TETRA project is to tackle the very current issue of industrial security at its source and to come up with field solutions. To achieve this, Cottyn and his colleagues at XiaK made a list of straightforward goals:

- Setting up an industrial security test centre. Cottyn explains, ‘Our XiaK lab is equipped with some typical models of automation networks so we can do tests in a secure and controlled environment. In this simulation setting, we can develop and apply targeted

attacks on a wide range of automation components, network configurations and industrial control systems, which will enable us to uncover the vulnerabilities of old and new technologies.’

- Stimulating general awareness with online quick scans of industrial security. ‘We made an inventory of various automation components and technologies with a list of risks, conditions and points of attention for every element. For such a quick scan, users (production and automation companies) just enter their configuration and they get tailored feedback,’ he says.

- Developing and validating a structured approach to execute industrial security audits. ‘A structured approach will enable companies to make their existing automation networks

secure with a step-by-step approach involving several aspects: physical security, patch management and network architecture.’

Because industrial control systems and network technologies are applied in such a wide range of areas, the TETRA project has great potential for companies and even social profit organisations. ‘It’s not only automation and production companies that belong to our wider target group,’ says Cottyn, ‘but also utility companies supplying water, gas and electricity and even some secure government institutions (like prisons).’

© X

iaK

More information

TraPIST, iMinds: www.iminds.be/en/projects/2014/03/20/trapist

Databases and Theoretical Computer Science Research Group,

Hasselt University: alpha.uhasselt.be/research/groups/theocomp/

XiaK, Centre of eXcellence in Industrial Automation, Ghent

University (Campus Kortrijk) and Howest: www.xiak.be

Page 18: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 18 -

Internet anywhere and all the time. We take it almost for granted. But in many, mainly developing, countries, a good internet connection is still a rare luxury. Rudy Gevaert, affiliated to the ICT Department at Ghent University, is trying to do something about this, in Ethiopia, Cuba and other places. First some context, though. ‘Jimma University in Ethiopia uses, with all its 1,500 computers together, an internet connection (or so-called uplink) of 256Mbit/s. A standard broadband internet subscription in Belgium offers even mere home users 100Mbit/s... So any person surfing in Jimma has a slower internet connection than we here at home.’ The people in Africa need a quicker uplink, then, that’s the logical conclusion, or is it? ‘Yes, that would be the simplest solution, but it’s also the most expensive one,’ he answers. ‘And unfortunately that’s impossible in Ethiopia. The country only has one telecom provider, so there’s no competition, and the prices are high, obviously. On top of this, the country has no coast, so it can’t lay cables directly, and needs to rely on telecom providers

in neighbouring countries for its internet. And the uplink from Ethiopia is limited and saturated already.’

A better solution is using the existing bandwidth better or bandwidth management and optimization (BMO), thinks Gevaert. He sums up the most important tricks of the trade:

- Installing software that stores popular websites locally (or caching).

- Offering local download servers (or mirrors).

- Offering local mail servers. That would mean 95% of all email traffic would stay within the university network. If everyone is using Yahoo! or Hotmail, they waste a lot of bandwidth.

- Checking all the websites that get visited for viruses and illegal content.

- Disallowing certain websites, or only allowing them

outside office hours. Sites like Facebook and YouTube waste a lot of bandwidth.

Gevaert’s approach is only pragmatic, though. ‘We’re of course facing all those other problems every development project is confronted with as well: staff turnover, lack of proper education and training, inadequate electricity supply, and I could go on like this for a while. We have no or only very little influence on these external factors, so we can only make use of the limited bandwidth there is in the best possible way. It’s not that difficult, but executing it takes longer. We’re also investing a lot of time in training local people so they can solve problems themselves. Then the world will be open to them.’

More information

Information and Communication Technology Department, Ghent University: www.ugent.be/en/ghentuniv/administration/dict

What if stops?

Page 19: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

- 19 -

RESEARCH IN FLANDERS>< our knowledge

makes the difference

KNOWLEDGE AREAFLANDERS

Author: Senne Starckx

The thematic papers are published by Research in Flanders, a project run by Flanders Knowledge Area.

The project Research in Flanders is funded by the Flemish Government, Department of Foreign Affairs.

Flanders Knowledge Area supports, through different projects, the internationalization of higher education in Flanders, Belgium. Ravensteingalerij 27 – bus 61000 BrusselT. + 32 (0)2 792 55 19 www.FlandersKnowledgeArea.be

D/2015/12.812/7

Editions1. Materials Science2. Urban Planning3. Industrial Design4. Research in Times of Crisis5. World War I6. Food7. Big Data

The Flemish Government cannot be held responsible for the content of this publication.

Page 20: Big Data in Flanders_TP...Flanders is at the centre of basic research that works with big data as its starting point, and at the same time capitalises on that digital data with new

© Big N2N