• No results found

Dealing with poor data quality of OSINT data in fraud risk analysis

N/A
N/A
Protected

Academic year: 2021

Share "Dealing with poor data quality of OSINT data in fraud risk analysis"

Copied!
1
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Dealing  with  poor  data  quality  of  osint  data  in  fraud  risk  analysis    

Governmental  organizations  responsible  for  keeping  certain  types  of  fraud  under   control,  often  use  data-­‐driven  methods  for  both  immediate  detection  of  fraud,  or   for  fraud  risk  analysis  aimed  at  more  effectively  targeting  inspections.  A  blind   spot  in  such  methods,  is  that  the  source  data  often  represents  a  'paper  reality'.   Fraudsters  will  attempt  to  disguise  themselves  in  the  data  they  supply  painting  a   world  in  which  they  do  nothing  wrong.  This  blind  spot  can  be  counteracted  by   enriching  the  data  with  traces  and  indicators  from  more  'real-­‐world'  sources   such  as  social  media  and  internet.  One  of  the  crucial  data  management  problems   in  accomplishing  this  enrichment  is  how  to  capture  and  handle  data  quality   problems.  The  presentation  will  start  with  a  real-­‐world  example,  which  is  also   used  as  starting  point  for  a  problem  generalization  in  terms  of  information   combination  and  enrichment  (ICE).  We  then  present  the  ICE  technology  as  well   as  how  data  quality  problems  can  be  managed  with  probabilistic  databases.  In   terms  of  the  4  V's  of  big  data  -­‐-­‐  volume,  velocity,  variety  and  veracity  -­‐-­‐  this   presentation  focuses  on  the  third  and  fourth  V's:  variety  and  veracity.  

Referenties

GERELATEERDE DOCUMENTEN

jaren levert deze schattingsfout reeds een geringere bijdrage tot de totale voorspelfout dan de onder ad a genoemde statistische fout. ad d Soms worden in het

- Voor waardevolle archeologische vindplaatsen die bedreigd worden door de geplande ruimtelijke ontwikkeling en die niet in situ bewaard kunnen blijven:.  Wat is

 Kies het aantal clusters K en start met willekeurige posities voor K centra.

Data owners need to be assigned and users need to be identified because; these roles are key in the identification and valuation of information assets, they can impose

The regularized SCA model is capable of identifying common components (i.e., joint variation) in the component loading matrix across all data blocks and distinctive components

Specifically, through (1) building an understanding of information needs and offers of communities and organizations, (2) streamlining data gathering and analysis, and (3)

Comparison of the spectra captured at the beginning of the experiment (Fig. 9 , day 1) shows a lower signal-to-noise ratio of the perchlorate peak for the measurement in drinking

Since the availability of the last time point usually determines the choice for a particular read session, this implies that data of previous sessions will usually be