home 
 
 
 
ennl
 
Home
A propos
Services
Cours
Ressources
Contacts
MyABIS
C
Tous les coursbalkjeGénéralités » Introduction HW & SW » Soft skills » Cycles completsSystèmes d’exploitation » MVS - z/OS » Linux - UNIX » Mac OS X » iPad et iPhone iOSBases de données et middleware » Relational databases & SQL » Db2 for z/OS » Db2 for LUW » Oracle » SQL Server » MySQL & MariaDB » IMS » CICS » IBM MQ » WebSphere » Data Science, Big data et analyticsDéveloppement d’applications » Méthodiques et techniques » TOGAF » PRINCE2 » Agile et Scrum » Les langages de programmation » Internet development » Object Oriented systems » Java » Development tools » SAS » XML » SOA & web servicesGestion de système » ITIL » SecuritybalkjeEn pratiqueInscriptions 
Cette page n'est pas disponible en français.
Big data in practice: text analytics

"Big data" has everything to do with "analytics": analysing large amounts of data in order to extract "business intelligence" hence information from the data. Speaking of "data", we often think of numbers and tables, and statistical analysis of those. But there is a lot of knowledge hidden in textual data: ordinary messages, written by humans, either in full phrases or not: like e.g. emails, job application letters, Twitter and Facebook messages, newspaper articles, websites, you name it. The extracted information can than be used for e.g. a "simple" application like searching for a text fragment, sorted by relevance, based on a search keyword. A kind of "Google Search", otherwise said. Or for an application like sentiment analysis.

During this training, we'll first introduce the most important concepts and terminology for text analysis and "text mining", like tokens, normalisation, lemmatisation, part-of-speech, language models, text classification, ... Quickly it will become clear that automated text analysis is more complicated than it might have seemed: aspects like language, grammar, spelling mistakes, synonyms, negation, order of words, punctuation marks ... complicate the analysis. This is of course caused by the fact that text is in the first place meant as a communication means between humans, not to be understood by computers! Even the "simple" Google Search application turns out to be a real challenge.

Meanwhile, several software packages and libraries have been developed which take care of the technical foundation of "natural language processing" (NLP). During the training we will work with some of these package like the NLTK toolkit, Apache OpenNLP, and Standford's NLP Suite. Also the use of regular expressions will be treated.

At the end of this training, you will have built up sufficient basic expertise to set up a specific application which uses one of the NLP libraries, and which implements a text mining application.

Schedule

dateduréelang.  lieu  prix
22 Nov1ELeuven  (BE)475 EUR  (excl. TVA) 
INFO SESSION ET INSCRIPTION

Intended for

This training is intended for those who want to start practising "text analytics": developers, data architects, business analysts, and market researchers wanting to obtain a better idea of the building blocks and technologies behind text analytics.

Background

Some familiarity with statistical concepts (histogram, classification, hypothesis tests), see e.g. Statistics fundamentals. Also, a minimal programming background would be helpful.

Main topics

Training method

Classroom training, with practical examples and supported by extensive exercises.

Duration

1 day.

Course leader

Peter Vanroose.


INFO SESSION ET INSCRIPTION