Getting Onboard with HTTPS, Big Data Tech and The Search for Talent in Data Analytics – CF006
This week we talk news and implementation methodologies for HTTPS, take a first look at some of the newer big data top level projects at apache, and even get a bit philosophical talking about the search for IT and data analytics talent in the USA. Joined this week by a new guest and future co-host, Ashton Webster, and undergraduate in the Cybersecurity Honors College (ACES) at the University of Maryland, we have an hour of jam packed conversation that spans multiple topics in cybersecurity and big data.
Cyber Frontiers is all about Exploring Cyber security, Big Data, and the Technologies Shaping the Future Through an Academic Perspective! Christian Johnson, a student at the University of Maryland will bring fresh and relevant topics to the show based on the current work he does.
Support the Average Guy Tech Scholarship Fund: https://www.patreon.com/theaverageguy
WANT TO SUBSCRIBE? We now have Video Large / Small and Video iTunes options at http://theAverageGuy.tv/subscribe
You can contact us via email at jim@theaverageguy.tv or call in your questions or comments to be played on the show at (402) 478-8450
Listen Mobile: |
Audio Only |
HTTPS/SSL OVERVIEW
I want to get a message to my friend (HTML), but I realize that there is someone intercepting our messages. How can I guarantee to my friend that there wasn’t someone in between (MITM)? By creating a message authentication code: if the message changes at all, the code changes. And by the way, it would be nice to have encryption to hide the message, and compression to make it smaller…
HTTPS = HTTP with SSL (now TLS)
Introduction to SSL, how it works
The TLS handshake: agree upon a method of exchanging messages. At the very least, guarantee authentication (MAC), optionally agree upon encryption and compression
prevents man in the middle attacks, beast attacks, re-negotiation etc.
Interesting site on ssl encryption shows which websites are vulnerable to ssl attacks – https://www.trustworthyinternet.org/ssl-pulse/
Usage statistics for ssl certificates: http://trends.builtwith.com/ssl You can also search for virtually any software in the search bar, here is openssl http://trends.builtwith.com/Server/OpenSSL
http://www.libressl.org/ not the sexiest website, but hey, they are busy fixing the earth’s problems of insecure ssl
Implementing SSL on the Cheap:
https://www.globalsign.com/ssl-information-center/types-of-ssl-certificate.html
http://www.ssls.com/ – $25 for 5 years worth of SSL certificate. (domain validated)
Levels: Domain, Organization, Extended Validation
For the “average guy” – here’s why domain validation is the way to go.
Google PageRank is now evaluating SSL into its algorithm – http://arstechnica.com/security/2014/08/in-major-shift-google-boosts-search-rankings-of-https-protected-sites/
Post-Snowden world – SSL adoption pretty much tripled – http://thinkprogress.org/world/2014/05/17/3438919/more-people-turn-to-encryption-after-snowden-leaks/
Configuration Tips:
http://www.whynopadlock.com/check.php – Finding insecure elements on the page
SNI – Multi-host environment https://wiki.apache.org/httpd/NameBasedSSLVHostsWithSNI
OVERVIEW OF BIG DATA & APACHE
Starting at the bottom: HDFS http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
YARN
Serves as a broker between application and file system. Resource management and runs multiple applications that interact with hdfs http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
Brainstorming thoughts (AW):
Christian – Hadoop story time?
Storm – http://storm.incubator.apache.org/
Streaming data (*real time*)
moving from tuple to batch (define batch and tuple)
Before even entering the database!
Storm can store its final results in hdfs still, and even get streams from it https://github.com/ptgoetz/storm-hdfs
Why Researchers should care
Process the data as it comes in would significantly cut down on the sheer volume of data to post process (i.e. a few gigs a second as it comes in instead of terabytes at once after)
analysis and collection virtually simultaneous = efficient
Why Enterprise should care: Real time data means real time decisions (or at least faster ones)
list of companies that use storm: https://storm.incubator.apache.org/documentation/Powered-By.html includes spotify, weather channel, webmd, and other big names!
Intrusion detection systems
Spark – https://spark.apache.org/
evolution of mapreduce (disk -> in memory)
scala and python – word count in a line instead of a bunch of
Machine learning (anecdote about ben and his long processes)
Why researchers should care
Why enterprise should care: Cuts down on time spent processing
Tez – DAGS instead of multiple map reduce programs http://hortonworks.com/hadoop/tez/
Using vertices as processing points and edges as data flow (similar to storm streams and bolts) http://hortonworks.com/blog/expressing-data-processing-in-apache-tez/
Future topics…
Signature based vs “baseline” based malicious behavior detection – True positives, false positives, etc. and cost of mistakes
datalakes versus data warehouses
Jim’s Twitter: http://twitter.com/#!/jcollison
Contact Christian: christian@theaverageguy.tv
Contact the show at jim@theaverageguy.tv
Find this and other great Podcasts from the Average Guy Network at http://theaverageguy.tv
Music courtesy of Ryan King. Check out the Die Hard Cafe band and other original works at:
http://diehardcafe.bandcamp.com/ / http://cokehabitgo.tumblr.com/tagged/my-music
Some links may contain affiliate codes that benefit the Average Guy Podcast Network.