Getting Onboard with HTTPS, Big Data Tech and The Search for Talent in Data Analytics – CF006

This week we talk news and implementation methodologies for HTTPS, take a first look at some of the newer big data top level projects at apache, and even get a bit philosophical talking about the search for IT and data analytics talent in the USA. Joined this week by a new guest and future co-host, Ashton Webster, and undergraduate in the Cybersecurity Honors College (ACES) at the University of Maryland, we have an hour of jam packed conversation that spans multiple topics in cybersecurity and big data.

Cyber Frontiers is all about Exploring Cyber security, Big Data, and the Technologies Shaping the Future Through an Academic Perspective!   Christian Johnson, a student at the University of Maryland will bring fresh and relevant topics to the show based on the current work he does.

Support the Average Guy Tech Scholarship Fund: https://www.patreon.com/theaverageguy

WANT TO SUBSCRIBE? We now have Video Large / Small and Video iTunes options at http://theAverageGuy.tv/subscribe

You can contact us via email at jim@theaverageguy.tv or call in your questions or comments to be played on the show at (402) 478-8450

Listen Mobile:

 

 


HTTPS/SSL OVERVIEW

I want to get a message to my friend (HTML), but I realize that there is someone intercepting our messages.  How can I guarantee to my friend that there wasn’t someone in between (MITM)? By creating a message authentication code: if the message changes at all, the code changes.  And by the way, it would be nice to have encryption to hide the message, and compression to make it smaller…

  • HTTPS = HTTP with SSL (now TLS)

  • Introduction to SSL, how it works

Implementing SSL on the Cheap:

https://www.globalsign.com/ssl-information-center/types-of-ssl-certificate.html

OVERVIEW OF BIG DATA & APACHE

Starting at the bottom: HDFS http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html

YARN

Serves as a broker between application and file system.  Resource management and runs multiple applications that interact with hdfs http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

Brainstorming thoughts (AW):

  • Christian – Hadoop story time?

  • Storm – http://storm.incubator.apache.org/

    • Streaming data (*real time*)

    • moving from tuple to batch (define batch and tuple)

    • Before even entering the database! 

    • Storm can store its final results in hdfs still, and even get streams from it https://github.com/ptgoetz/storm-hdfs

    • Why Researchers should care

      • Process the data as it comes in would significantly cut down on the sheer volume of data to post process (i.e. a few gigs a second as it comes in instead of terabytes at once after)

      • analysis and collection virtually simultaneous = efficient

  • Why Enterprise should care: Real time data means real time decisions (or at least faster ones)

    • list of companies that use storm: https://storm.incubator.apache.org/documentation/Powered-By.html includes spotify, weather channel, webmd, and other big names!

    • Intrusion detection systems

Future topics…

  • Signature based vs “baseline” based malicious behavior detection – True positives, false positives, etc. and cost of mistakes

  • datalakes versus data warehouses

 


Jim’s Twitter: http://twitter.com/#!/jcollison

Contact Christian: christian@theaverageguy.tv

Contact the show at jim@theaverageguy.tv

Find this and other great Podcasts from the Average Guy Network at http://theaverageguy.tv

Music courtesy of Ryan King. Check out the Die Hard Cafe band and other original works at:
http://diehardcafe.bandcamp.com/http://cokehabitgo.tumblr.com/tagged/my-music

Some links may contain affiliate codes that benefit the Average Guy Podcast Network.