Big Data Business Intelligence for Criminal Intelligence Analysis Training Course

Technological advancements and the exponential growth of information are reshaping law enforcement practices. The challenges posed by Big Data are as significant as its potential benefits. Efficient data storage is one such challenge, while effective analysis presents another.

In this instructor-led live training, participants will develop the appropriate mindset for approaching Big Data technologies, assess their impact on existing processes and policies, and implement these technologies to identify criminal activity and prevent crime. Case studies from law enforcement organizations worldwide will be examined to gain insights into their adoption approaches, challenges, and results.

By the end of this training, participants will be able to:

Integrate Big Data technology with traditional data gathering processes to construct a narrative during an investigation.
Implement industrial big data storage and processing solutions for data analysis.
Prepare a proposal for adopting the most suitable tools and processes to enable a data-driven approach to criminal investigation.

Course Format

Interactive lectures and discussions.
Extensive exercises and practice.
Hands-on implementation in a live lab environment.

Course Customization Options

To request customized training for this course, please contact us to arrange.

This course is available as onsite live training in Portugal or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Day 01

Overview of Big Data Business Intelligence for Criminal Intelligence Analysis

Case Studies from Law Enforcement - Predictive Policing
Big Data adoption rate in Law Enforcement Agencies and how they are aligning their future operations around Big Data Predictive Analytics
Emerging technology solutions such as gunshot sensors, surveillance video, and social media
Using Big Data technology to mitigate information overload
Integrating Big Data with Legacy data
Basic understanding of enabling technologies in predictive analytics
Data Integration & Dashboard visualization
Fraud management
Business Rules and Fraud detection
Threat detection and profiling
Cost-benefit analysis for Big Data implementation

Introduction to Big Data

Main characteristics of Big Data -- Volume, Variety, Velocity, and Veracity.
MPP (Massively Parallel Processing) architecture
Data Warehouses – static schema, slowly evolving dataset
MPP Databases: Greenplum, Exadata, Teradata, Netezza, Vertica, etc.
Hadoop-Based Solutions – no constraints on dataset structure.
Typical pattern: HDFS, MapReduce (crunch), retrieve from HDFS
Apache Spark for stream processing
Batch processing – suited for analytical/non-interactive tasks
Volume: CEP streaming data
Typical choices – CEP products (e.g., Infostreams, Apama, MarkLogic, etc.)
Less production-ready – Storm/S4
NoSQL Databases – (columnar and key-value): Best suited as an analytical adjunct to data warehouses/databases

NoSQL solutions

KV Store - Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB)
KV Store - Dynamo, Voldemort, Dynomite, SubRecord, MongoDB, DovetailDB
KV Store (Hierarchical) - GT.m, Cache
KV Store (Ordered) - TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord
KV Cache - Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua
Tuple Store - Gigaspaces, Coord, Apache River
Object Database - ZopeDB, DB40, Shoal
Document Store - CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris
Wide Columnar Store - BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI

Varieties of Data: Introduction to Data Cleaning issues in Big Data

RDBMS – static structure/schema, does not promote an agile, exploratory environment.
NoSQL – semi-structured, enough structure to store data without an exact schema before storing data
Data cleaning issues

Hadoop

When to select Hadoop?
STRUCTURED - Enterprise data warehouses/databases can store massive data (at a cost) but impose structure (not ideal for active exploration)
SEMI-STRUCTURED data – difficult to process using traditional solutions (DW/DB)
Warehousing data = HUGE effort and static even after implementation
For variety & volume of data, processed on commodity hardware – HADOOP
Commodity H/W needed to create a Hadoop Cluster

Introduction to Map Reduce /HDFS

MapReduce – distribute computing over multiple servers
HDFS – make data available locally for the computing process (with redundancy)
Data – can be unstructured/schema-less (unlike RDBMS)
Developer responsibility to make sense of data
Programming MapReduce = working with Java (pros/cons), manually loading data into HDFS

Day 02

Big Data Ecosystem -- Building Big Data ETL (Extract, Transform, Load) -- Which Big Data Tools to use and when?

Hadoop vs. Other NoSQL solutions
For interactive, random access to data
Hbase (column-oriented database) on top of Hadoop
Random access to data but restrictions imposed (max 1 PB)
Not ideal for ad-hoc analytics, good for logging, counting, time-series
Sqoop - Import from databases to Hive or HDFS (JDBC/ODBC access)
Flume – Stream data (e.g., log data) into HDFS

Big Data Management System

Moving parts, compute nodes start/fail: ZooKeeper - For configuration/coordination/naming services
Complex pipeline/workflow: Oozie – manage workflow, dependencies, daisy chain
Deploy, configure, cluster management, upgrade etc (sys admin): Ambari
In Cloud: Whirr

Predictive Analytics -- Fundamental Techniques and Machine Learning based Business Intelligence

Introduction to Machine Learning
Learning classification techniques
Bayesian Prediction -- preparing a training file
Support Vector Machine
KNN p-Tree Algebra & vertical mining
Neural Networks
Big Data large variable problem -- Random Forest (RF)
Big Data Automation problem – Multi-model ensemble RF
Automation through Soft10-M
Text analytic tool-Treeminer
Agile learning
Agent-based learning
Distributed learning
Introduction to Open Source Tools for predictive analytics: R, Python, RapidMiner, Mahout

Predictive Analytics Ecosystem and its application in Criminal Intelligence Analysis

Technology and the investigative process
Insight analytics
Visualization analytics
Structured predictive analytics
Unstructured predictive analytics
Threat/fraudster/vendor profiling
Recommendation Engine
Pattern detection
Rule/Scenario discovery – failure, fraud, optimization
Root cause discovery
Sentiment analysis
CRM analytics
Network analytics
Text analytics for obtaining insights from transcripts, witness statements, internet chatter, etc.
Technology-assisted review
Fraud analytics
Real-Time Analytics

Day 03

Real-Time and Scalable Analytics Over Hadoop

Why common analytic algorithms fail in Hadoop/HDFS
Apache Hama - for Bulk Synchronous distributed computing
Apache SPARK - for cluster computing and real-time analytics
CMU Graphics Lab2 - Graph-based asynchronous approach to distributed computing
KNN p -- Algebra-based approach from Treeminer for reduced hardware cost of operation

Tools for eDiscovery and Forensics

eDiscovery over Big Data vs. Legacy data – a comparison of cost and performance
Predictive coding and Technology-Assisted Review (TAR)
Live demo of vMiner for understanding how TAR enables faster discovery
Faster indexing through HDFS – Velocity of data
NLP (Natural Language Processing) – open source products and techniques
eDiscovery in foreign languages -- technology for foreign language processing

Big Data BI for Cyber Security – Getting a 360-degree view, speedy data collection, and threat identification

Understanding the basics of security analytics -- attack surface, security misconfiguration, host defenses
Network infrastructure / Large datapipe / Response ETL for real-time analytics
Prescriptive vs predictive – Fixed rule-based vs auto-discovery of threat rules from Meta data

Gathering disparate data for Criminal Intelligence Analysis

Using IoT (Internet of Things) as sensors for capturing data
Using Satellite Imagery for Domestic Surveillance
Using surveillance and image data for criminal identification
Other data gathering technologies -- drones, body cameras, GPS tagging systems, and thermal imaging technology
Combining automated data retrieval with data obtained from informants, interrogation, and research
Forecasting criminal activity

Day 04

Fraud Prevention BI from Big Data in Fraud Analytics

Basic classification of Fraud Analytics -- rules-based vs predictive analytics
Supervised vs unsupervised Machine learning for Fraud pattern detection
Business-to-business fraud, medical claims fraud, insurance fraud, tax evasion, and money laundering

Social Media Analytics -- Intelligence gathering and analysis

How Social Media is used by criminals to organize, recruit, and plan
Big Data ETL API for extracting social media data
Text, image, metadata, and video
Sentiment analysis from social media feed
Contextual and non-contextual filtering of social media feed
Social Media Dashboard to integrate diverse social media
Automated profiling of social media profiles
Live demo of each analytic will be given through Treeminer Tool

Big Data Analytics in image processing and video feeds

Image Storage techniques in Big Data -- Storage solution for data exceeding petabytes
LTFS (Linear Tape File System) and LTO (Linear Tape Open)
GPFS-LTFS (General Parallel File System - Linear Tape File System) -- layered storage solution for Big image data
Fundamentals of image analytics
Object recognition
Image segmentation
Motion tracking
3-D image reconstruction

Biometrics, DNA, and Next Generation Identification Programs

Beyond fingerprinting and facial recognition
Speech recognition, keystroke (analyzing a user's typing pattern), and CODIS (Combined DNA Index System)
Beyond DNA matching: using forensic DNA phenotyping to construct a face from DNA samples

Big Data Dashboard for quick accessibility of diverse data and display :

Integration of existing application platform with Big Data Dashboard
Big Data management
Case Study of Big Data Dashboard: Tableau and Pentaho
Use Big Data app to push location-based services in Govt.
Tracking system and management

Day 05

How to justify Big Data BI implementation within an organization:

Defining the ROI (Return on Investment) for implementing Big Data
Case studies for saving Analyst Time in collection and preparation of Data – increasing productivity
Revenue gain from lower database licensing cost
Revenue gain from location-based services
Cost savings from fraud prevention
An integrated spreadsheet approach for calculating approximate expenses vs. Revenue gain/savings from Big Data implementation.

Step-by-step procedure for replacing a legacy data system with a Big Data System

Big Data Migration Roadmap
What critical information is needed before architecting a Big Data system?
What are the different ways for calculating Volume, Velocity, Variety, and Veracity of data
How to estimate data growth
Case studies

Review of Big Data Vendors and review of their products.

Accenture
APTEAN (Formerly CDC Software)
Cisco Systems
Cloudera
Dell
EMC
GoodData Corporation
Guavus
Hitachi Data Systems
Hortonworks
HP
IBM
Informatica
Intel
Jaspersoft
Microsoft
MongoDB (Formerly 10Gen)
MU Sigma
Netapp
Opera Solutions
Oracle
Pentaho
Platfora
Qliktech
Quantum
Rackspace
Revolution Analytics
Salesforce
SAP
SAS Institute
Sisense
Software AG/Terracotta
Soft10 Automation
Splunk
Sqrrl
Supermicro
Tableau Software
Teradata
Think Big Analytics
Tidemark Systems
Treeminer
VMware (Part of EMC)

Q/A session

Requirements

Knowledge of law enforcement processes and data systems
Basic understanding of SQL/Oracle or relational databases
Basic understanding of statistics (at the spreadsheet level)

Audience

Law enforcement specialists with a technical background

35 Hours

Custom Corporate Training

Training solutions designed exclusively for businesses.

Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
Flexible Schedule: Dates and times adapted to your team's agenda.
Format: Online (live), In-company (at your offices), or Hybrid.

Investment

Price per private group, online live training, starting from 6500 € + VAT*

(*The final price may vary depending on the technical specialization of the course, the level of customization, the method of delivery and the number of learners)

Need help picking the right course?
info@nobleprog.pt or +351 30 050 9666

Testimonials (3)

basics and loved the prepared documents and exercises

Rekha Nallam - GE Medical Systems Polska Sp. z o.o.

Course - Introduction to Predictive AI

Deepthi was super attuned to my needs, she could tell when to add layers of complexity and when to hold back and take a more structured approach. Deepthi truly worked at my pace and ensured I was able to use the new functions /tools myself by first showing then letting me recreate the items myself which really helped embed the training. I could not be happier with the results of this training and with the level of expertise of Deepthi!

Big Data Business Intelligence for Criminal Intelligence Analysis Training Course